Kafka的消费者consumer是通过遍历KafkaStream的迭代器ConsumerIterator来消费消息的,其数据来源是分配给给KafkaStream的阻塞消息队列BlockingQueue,而BlockingQueue中的消息数据来自于针对每个Broker Server的FetchThread线程。FetchThread线程会将Broker Server上的部分partition数据发送给对应的阻塞消息队列BlockingQueue。其具体流程如下:
其中类kafka.consumer.ZookeeperConsumerConnector提供了所有的功能,具体的代码如下:
private[kafka] class ZookeeperConsumerConnector(
val config: ConsumerConfig,
val enableFetcher: Boolean) extends ConsumerConnector with Logging with KafkaMetricsGroup {
/*
*创建消息消费的数据流,topicCountMap告诉Kafka我们在Consumer中将用多少个线程来消费该topic。topicCountMap的key是topic name,value针对该topic是线程的数量
*/
def createMessageStreams[K,V](topicCountMap: Map[String,Int], keyDecoder: Decoder[K], valueDecoder: Decoder[V])
: Map[String, List[KafkaStream[K,V]]] = {
consume(topicCountMap, keyDecoder, valueDecoder)
}
def consume[K, V](
topicCountMap: scala.collection.Map[String,Int],
keyDecoder: Decoder[K],
valueDecoder: Decoder[V])
: Map[String,List[KafkaStream[K,V]]] = {
debug("entering consume ")
if (topicCountMap == null)
throw new RuntimeException("topicCountMap is null")
val topicCount = TopicCount.constructTopicCount(consumerIdString, topicCountMap)
val topicThreadIds = topicCount.getConsumerThreadIdsPerTopic
// make a list of (queue,stream) pairs, one pair for each threadId
val queuesAndStreams = topicThreadIds.values.map(threadIdSet =>
threadIdSet.map(_ => {
val queue = new LinkedBlockingQueue[FetchedDataChunk](config.queuedMaxMessages)
val stream = new KafkaStream[K,V](
queue, config.consumerTimeoutMs, keyDecoder, valueDecoder, config.clientId)
(queue, stream)
})
).flatten.toList
val dirs = new ZKGroupDirs(config.groupId)
registerConsumerInZK(dirs, consumerIdString, topicCount)
reinitializeConsumer(topicCount, queuesAndStreams)
loadBalancerListener.kafkaMessageAndMetadataStreams.asInstanceOf[Map[String, List[KafkaStream[K,V]]]]
}
}
接下来主要讲解4个部分:
(1)、ConsumerThread和Partition的分配算法
(2)、FetchThread的启动过程
(3)、KafkaStream如何遍历BlockingQueue
(4)、KafkaStream的负载均衡流程
ConsumerThread和Partition的分配算法
ConsumerThread本质上就是客户端的消费线程,每一个消费者线程消费若干个Partition上的数据或者没有消费数据,并且ConsumerThread和BlockingQueue相互之间一一对应,只要确定了ConsumerThread和partition的对应关系,就确定了BlockingQueue和partition的对应关系。
kafka提供了两种ConsumerThread和partition的分配算法,分别为Range(范围分区分配)和RoundRobin(循环分区分配),分配算法由参数partition.assignment.strategy决定,默认为range
ConsumerFetchThread
一旦当前消费者实例的ConsumerThread和partition的关系确定以后,就需要启动ConsumerFetchThread消费Broker Server上的消息,ConsumerFetchThread会将获取到的partition数据转发至对应的BlockingQueue供ConsumerThread消费。
在消费消息前,客户端需要提前知道各个Partition的Leader Replica所在的Broker Server,因此需要发送TopicMetadataRequest询问Kafka集群相关Topic的元数据,这部分工作是由线程LeaderFinderThread完成的,该线程负责寻找Partition的Leader Replica所在的Broker Server,一旦找到后,就会向对应的ConsumerFetchThread下发拉取该Partition消息的命令。
在Kafka中,类ConsumerFetchManager负责对ConsumerFetchThread线程进行管理。
1、ConsumerFetchThread的启动
ConsumerFetchManager在启动时会创建线程LeaderFinderThread,其中ConsumerFetchManager内部的noLeaderPartitionSet保存了Leader Replica还没有明确的TopicAndPartition,LeaderFinderThread从noLeaderPartitionSet中获取对应的TopicAndPartition,然后遍历Broker Server发送元数据获取请求。具体代码如下:
private class LeaderFinderThread(name: String) extends ShutdownableThread(name) {
override def doWork() {
//保存TopicAndPartition的Leader Replica所在的Broker Server
val leaderForPartitionsMap = new HashMap[TopicAndPartition, Broker]
lock.lock()
try {
//如果当前没有待获取的,则等待
while (noLeaderPartitionSet.isEmpty) {
trace("No partition for leader election.")
cond.await()
}
//从zookeeper中获取所有的brokers
val brokers = getAllBrokersInCluster(zkClient)
/*
* 遍历brokers 查找noLeaderPartitionSet中的topic元数据,如果没有找到就抛出异常
*/
val topicsMetadata = ClientUtils.fetchTopicMetadata(
noLeaderPartitionSet.map(m => m.topic).toSet,
brokers,
config.clientId,
config.socketTimeoutMs,
correlationId.getAndIncrement).topicsMetadata
//
topicsMetadata.foreach { tmd =>
val topic = tmd.topic
tmd.partitionsMetadata.foreach { pmd =>
val topicAndPartition = TopicAndPartition(topic, pmd.partitionId)
if(pmd.leader.isDefined && noLeaderPartitionSet.contains(topicAndPartition))
{
val leaderBroker = pmd.leader.get
//更新leaderForPartitionsMap
leaderForPartitionsMap.put(topicAndPartition, leaderBroker)
//剔除topicAndPartition
noLeaderPartitionSet -= topicAndPartition
}
}
}
} catch {
case t: Throwable => {
if (!isRunning.get())
throw t /* If this thread is stopped, propagate this exception to kill the thread. */
else
warn("Failed to find leader for %s".format(noLeaderPartitionSet), t)
}
} finally {
lock.unlock()
}
try {
//将topicAndPartition添加至对应的ConsumerFetchThread,如果不存在,则创建
addFetcherForPartitions(leaderForPartitionsMap.map{
case (topicAndPartition, broker) =>
topicAndPartition -> BrokerAndInitialOffset(broker, partitionMap(topicAndPartition).getFetchOffset())}
)
} catch {
case t: Throwable => {
if (!isRunning.get())
throw t /* If this thread is stopped, propagate this exception to kill the thread. */
else {
warn("Failed to add leader for partitions %s; will retry".format(leaderForPartitionsMap.keySet.mkString(",")), t)
lock.lock()
noLeaderPartitionSet ++= leaderForPartitionsMap.keySet
lock.unlock()
}
}
}
shutdownIdleFetcherThreads()
Thread.sleep(config.refreshLeaderBackoffMs)
}
}
其中addFetcherForPartitions会负责启动ConsumerFetchThread,如果已经启动则会利用topicAndPartition和offset更新ConsumerFetchThread内部的partitionMap,partitionMap保存了topicAndPartition和对应的偏移量offset。
def addFetcherForPartitions(partitionAndOffsets: Map[TopicAndPartition, BrokerAndInitialOffset]) {
mapLock synchronized {
//将partitionAndOffsets按照BrokerAndFetcherId分组
val partitionsPerFetcher = partitionAndOffsets.groupBy{ case(topicAndPartition, brokerAndInitialOffset) =>
BrokerAndFetcherId(brokerAndInitialOffset.broker, getFetcherId(topicAndPartition.topic, topicAndPartition.partition))}
for ((brokerAndFetcherId, partitionAndOffsets) <- partitionsPerFetcher) {
var fetcherThread: AbstractFetcherThread = null
//fetcherThreadMap保存了brokerAndFetcherId和ConsumerFetchThread的映射关系
fetcherThreadMap.get(brokerAndFetcherId) match {
case Some(f) => fetcherThread = f
//如果不存在,则创建ConsumerFetcherThread
case None =>
fetcherThread = createFetcherThread(brokerAndFetcherId.fetcherId, brokerAndFetcherId.broker)
fetcherThreadMap.put(brokerAndFetcherId, fetcherThread)
fetcherThread.start
}
fetcherThreadMap(brokerAndFetcherId).addPartitions(partitionAndOffsets.map { case (topicAndPartition, brokerAndInitOffset) =>
topicAndPartition -> brokerAndInitOffset.initOffset
})
}
}
info("Added fetcher for partitions %s".format(partitionAndOffsets.map{ case (topicAndPartition, brokerAndInitialOffset) =>
"[" + topicAndPartition + ", initOffset " + brokerAndInitialOffset.initOffset + " to broker " + brokerAndInitialOffset.broker + "] "}))
}
2、ConsumerFetcherThread的执行逻辑
ConsumerFetcherThread会遍历其内部的partitionMap,消费partitionMap中包含的TopicAndPartition,然后将消费的数据发送至BlockingQueue。
ConsumerFetcherThread继承自AbstractFetcherThread,AbstractFetcherThread内部的doWork流程负责提取partitionMap中的topicAndPartition和offset,向Broker Server发送FetchRequest请求,然后更新partitionMap中的offset。最后利用ConsumerFetcherThread的processPartittionData函数来处理获取到的分区数据。具体流程如下:
override def doWork() {
inLock(partitionMapLock) {
//如果partitionMap为空,则等待
if (partitionMap.isEmpty)
partitionMapCond.await(200L, TimeUnit.MILLISECONDS)
//遍历partitionMap,组装FetchRequest请求参数
partitionMap.foreach {
case((topicAndPartition, offset)) =>
fetchRequestBuilder.addFetch(topicAndPartition.topic, topicAndPartition.partition,
offset, fetchSize)
}
}
val fetchRequest = fetchRequestBuilder.build()
if (!fetchRequest.requestInfo.isEmpty)
//处理FetchRequest请求
processFetchRequest(fetchRequest)
}
private def processFetchRequest(fetchRequest: FetchRequest) {
val partitionsWithError = new mutable.HashSet[TopicAndPartition]
var response: FetchResponse = null
try {
//向Broker Server发送FetchRequest请求
response = simpleConsumer.fetch(fetchRequest)
} catch {
case t: Throwable =>
if (isRunning.get) {
partitionMapLock synchronized {
//发送异常,记录partitionsWithError
partitionsWithError ++= partitionMap.keys
}
}
}
//记录发送频率
fetcherStats.requestRate.mark()
if (response != null) {
// process fetched data
inLock(partitionMapLock) {
//遍历FetchResponse
response.data.foreach {
case(topicAndPartition, partitionData) =>
val (topic, partitionId) = topicAndPartition.asTuple
val currentOffset = partitionMap.get(topicAndPartition)
// 检验partitionMap中的偏移量和FetchRequest中的偏移量,一致则说明有效
if (currentOffset.isDefined && fetchRequest.requestInfo(topicAndPartition).offset == currentOffset.get) {
partitionData.error match {
//响应成功
case ErrorMapping.NoError =>
try {
//获取该topicAndPartition对应的ByteBufferMessageSet
val messages = partitionData.messages.asInstanceOf[ByteBufferMessageSet]
//获取ByteBufferMessageSet的有效的字节数
val validBytes = messages.validBytes
//获取ByteBufferMessageSet的有下一个偏移量
val newOffset = messages.shallowIterator.toSeq.lastOption match {
case Some(m: MessageAndOffset) => m.nextOffset
case None => currentOffset.get
}
//更新partitionMap
partitionMap.put(topicAndPartition, newOffset)
fetcherLagStats.getFetcherLagStats(topic, partitionId).lag = partitionData.hw - newOffset
fetcherStats.byteRate.mark(validBytes)
/* 调用ConsumerFetcherThread的processPartitionData流程,
* 本质上就是将partitionData发送至BlockingQueue
*/
processPartitionData(topicAndPartition, currentOffset.get, partitionData)
} catch {
......
}
case ErrorMapping.OffsetOutOfRangeCode =>
try {
//偏移量越界,重置偏移量
val newOffset = handleOffsetOutOfRange(topicAndPartition)
partitionMap.put(topicAndPartition, newOffset)
error("Current offset %d for partition [%s,%d] out of range; reset offset to %d"
.format(currentOffset.get, topic, partitionId, newOffset))
} catch {
case e: Throwable =>
error("Error getting offset for partition [%s,%d] to broker %d".format(topic, partitionId, sourceBroker.id), e)
partitionsWithError += topicAndPartition
}
case _ =>
if (isRunning.get) {
error("Error for partition [%s,%d] to broker %d:%s".format(topic, partitionId, sourceBroker.id,
ErrorMapping.exceptionFor(partitionData.error).getClass))
partitionsWithError += topicAndPartition
}
}
}
}
}
}
ConsumerFetcherThread中的partitionMap参数保存了TopicAndPartition和PartitionTopicInfo的映射关系,PartitionTopicInfo中的chunkQueue参数指定了该TopicAndPartition对应的BlockingQueue。processPartitionData负责从PartitionTopicInfo中获取ChunkQueue,然后将消息集合放入ChunkQueue。其实现过程如下:
def processPartitionData(
topicAndPartition: TopicAndPartition,
fetchOffset: Long,
partitionData: FetchResponsePartitionData) {
//获取PartitionTopicInfo
val pti = partitionMap(topicAndPartition)
//校验前后的fetchOffset是否一致
if (pti.getFetchOffset != fetchOffset)
throw new RuntimeException("Offset doesn't match for partition [%s,%d] pti offset: %d fetch offset: %d"
.format(topicAndPartition.topic, topicAndPartition.partition, pti.getFetchOffset, fetchOffset))
/*
* 调用PartitionTopicInfo的enqueue函数
* 1)将消息写入BlockingQueue
* 2)更新PartitionTopicInfo中的fetchOffset
*/
pti.enqueue(partitionData.messages.asInstanceOf[ByteBufferMessageSet])
}
可见BlockingQueue中的元素就是一定偏移量范围内的消息集合。