这一节我们主要来分析joinGroup这块的代码,主要流程如图一。
流程展示
拆解JoinGroup协议
客户端的代码我们就不拿出来说了,等后面说到关键点的时候再拿出来一起分析,这里目前只需要知道会发JoinGroupRequest请求给服务端,请求及返回数据样例如下,协议的结构见图二及图三。
JoinGroupRequestData(groupId=‘mykafka-group’, sessionTimeoutMs=10000, rebalanceTimeoutMs=300000,
memberId=‘consumer-mykafka-group-1-e767a2e9-ac9d-4e61-af95-e50894101de9’, groupInstanceId=null, protocolType=‘consumer’, protocols=[JoinGroupRequestProtocol(name=‘cooperative-sticky’, metadata=[0, 1, 0, 0, 0, 1, 0, 6, 116, 101, 115, 116, 95, 50, -1, -1, -1, -1, 0, 0, 0, 0])])JoinGroupResponseData(throttleTimeMs=0, errorCode=0, generationId=1, protocolType=‘consumer’, protocolName=‘sticky’, leader=‘mykafka-group_4_1-c3c6047e-bbfe-48fb-8eba-cbb55d97ada8’, memberId=‘mykafka-group_4_1-c3c6047e-bbfe-48fb-8eba-cbb55d97ada8’, members=[JoinGroupResponseMember(memberId=‘mykafka-group_4_1-c3c6047e-bbfe-48fb-8eba-cbb55d97ada8’, groupInstanceId=‘mykafka-group_4_1’, metadata=[0, 1, 0, 0, 0, 1, 0, 7, 116, 111, 112, 105, 99, 95, 49, -1, -1, -1, -1, 0, 0, 0, 0])])
服务端处理
服务端对JoinGroupRequest请求会在kafka.server.KafkaApis#handleJoinGroupRequest代码中做处理。从图一的流程图中我们可以知道是否存在memberId及groupInstanceId会走不同的代码分支,下面我们以下几个方面来剖析源码。
如果JoinGroupRequest请求中memberId为空,groupInstanceId为空的情况
这里对应流程图中的分支见图四
首先在入口处会有个是否需要memberId的判断,代码如下,这是为了兼容低版本的消费端。在kafka2.5中,joinGroupRequest.version是7,经过比对各个版本的源码,kafka2.3以上joinGroupRequest.version会大于等于4,即在groupInstanceId为空的情况下是需要memberId的。
val requireKnownMemberId = joinGroupRequest.version >= 4 && groupInstanceId.isEmpty
接着会走kafka.coordinator.group.GroupCoordinator#doUnknownJoinGroup,代码如下,由于此时memberId及groupInstanceId均为空,会走(requireKnownMemberId) 为true的逻辑,大家可以看到如果需要memberId的话这里会生成memberId然后返回response,可以再往深一点想,即在consume初始消费的时候,在没有groupInstanceId的情况下,会发送两次请求,第一次请求memberId为空,服务端会返回对应的memberId,第二次请求会带上memberId进行真正的joinGroup操作。
private def doUnknownJoinGroup(group: GroupMetadata,
groupInstanceId: Option[String],
requireKnownMemberId: Boolean,
clientId: String,
clientHost: String,
rebalanceTimeoutMs: Int,
sessionTimeoutMs: Int,
protocolType: String,
protocols: List[(String, Array[Byte])],
responseCallback: JoinCallback): Unit = {
group.inLock {
//3.1.1 生成memberId优先使用groupInstanceId,没有则用clientId
val newMemberId = group.generateMemberId(clientId, groupInstanceId)
//这里首先判断缓存中有没有groupInstanceId,如果没有的话又需要生成memberId,则生成memberId后返回,如果存在groupInstanceId,则在生成memberId
// 后直接调用addMemberAndRebalance
if (group.hasStaticMember(groupInstanceId)) {
updateStaticMemberAndRebalance(group, newMemberId, groupInstanceId, protocols, responseCallback)
} else if (requireKnownMemberId) {
//3.1.2 如果需要memberId的话会直接返回response
debug(s"Dynamic member with unknown member id joins group ${group.groupId} in " +
s"${group.currentState} state. Created a new member id $newMemberId and request the member to rejoin with this id.")
group.addPendingMember(newMemberId)
addPendingMemberExpiration(group, newMemberId, sessionTimeoutMs)
responseCallback(JoinGroupResult(newMemberId, Errors.MEMBER_ID_REQUIRED))
} else {
info(s"${if (groupInstanceId.isDefined) "Static" else "Dynamic"} Member with unknown member id joins group ${group.groupId} in " +
s"${group.currentState} state. Created a new member id $newMemberId for this member and add to the group.")
addMemberAndRebalance(rebalanceTimeoutMs, sessionTimeoutMs, newMemberId, groupInstanceId,
clientId, clientHost, protocolType, protocols, group, responseCallback)
}
}
}
kafka.coordinator.group.GroupMetadata#generateMemberId
生成memberId的代码很简单,就是clientId或者groupInstanceId加uuid
def generateMemberId(clientId: String,
groupInstanceId: Option[String]): String = {
var memberId="";
groupInstanceId match {
case None =>
memberId = clientId + GroupMetadata.MemberIdDelimiter + UUID.randomUUID().toString
case Some(instanceId) =>
memberId = instanceId + GroupMetadata.MemberIdDelimiter + UUID.randomUUID().toString
}
memberId
}
如果JoinGroupRequest请求中memberId不为空,groupInstanceId为空的情况
这里对应流程图中的分支见图五
在这种情况下是走的kafka.coordinator.group.GroupCoordinator#doJoinGroup方法。看着方法比较长,实际就分两块,经过各种校验之后,要不走addMemberAndRebalance,要不就走updateMemberAndRebalance。
private def doJoinGroup(group: GroupMetadata,
memberId: String,
groupInstanceId: Option[String],
clientId: String,
clientHost: String,
rebalanceTimeoutMs: Int,
sessionTimeoutMs: Int,
protocolType: String,
protocols: List[(String, Array[Byte])],
responseCallback: JoinCallback): Unit = {
group.inLock {
if (group.is(Dead)) {
responseCallback(JoinGroupResult(memberId, Errors.COORDINATOR_NOT_AVAILABLE))
} else if (!group.supportsProtocols(protocolType, MemberMetadata.plainProtocolSet(protocols))) {
responseCallback(JoinGroupResult(memberId, Errors.INCONSISTENT_GROUP_PROTOCOL))
} else if (group.isPendingMember(memberId)) {
// A rejoining pending member will be accepted. Note that pending member will never be a static member.
if (groupInstanceId.isDefined) {
throw new IllegalStateException(s"the static member $groupInstanceId was not expected to be assigned " +
s"into pending member bucket with member id $memberId")
} else {
debug(s"Dynamic Member with specific member id $memberId joins group ${group.groupId} in " +
s"${group.currentState} state. Adding to the group now.")
addMemberAndRebalance(rebalanceTimeoutMs, sessionTimeoutMs, memberId, groupInstanceId,
clientId, clientHost, protocolType, protocols, group, responseCallback)
}
} else {
val groupInstanceIdNotFound = groupInstanceId.isDefined && !group.hasStaticMember(groupInstanceId)
if (group.isStaticMemberFenced(memberId, groupInstanceId, "join-group")) {
// given member id doesn't match with the groupInstanceId. Inform duplicate instance to shut down immediately.
responseCallback(JoinGroupResult(memberId, Errors.FENCED_INSTANCE_ID))
} else if (!group.has(memberId) || groupInstanceIdNotFound) {
responseCallback(JoinGroupResult(memberId, Errors.UNKNOWN_MEMBER_ID))
} else {
val member = group.get(memberId)
group.currentState match {
case PreparingRebalance =>
updateMemberAndRebalance(group, member, protocols, responseCallback)
case CompletingRebalance =>
if (member.matches(protocols)) {
// member is joining with the same metadata (which could be because it failed to
// receive the initial JoinGroup response), so just return current group information
// for the current generation.
responseCallback(JoinGroupResult(
members = if (group.isLeader(memberId)) {
group.currentMemberMetadata
} else {
List.empty
},
memberId = memberId,
generationId = group.generationId,
protocolType = group.protocolType,
protocolName = group.protocolName,
leaderId = group.leaderOrNull,
error = Errors.NONE))
} else {
// member has changed metadata, so force a rebalance
updateMemberAndRebalance(group, member, protocols, responseCallback)
}
case Stable =>
val member = group.get(memberId)
if (group.isLeader(memberId) || !member.matches(protocols)) {
// force a rebalance if a member has changed metadata or if the leader sends JoinGroup.
// The latter allows the leader to trigger rebalances for changes affecting assignment
// which do not affect the member metadata (such as topic metadata changes for the consumer)
updateMemberAndRebalance(group, member, protocols, responseCallback)
} else {
// for followers with no actual change to their metadata, just return group information
// for the current generation which will allow them to issue SyncGroup
responseCallback(JoinGroupResult(
members = List.empty,
memberId = memberId,
generationId = group.generationId,
protocolType = group.protocolType,
protocolName = group.protocolName,
leaderId = group.leaderOrNull,
error = Errors.NONE))
}
case Empty | Dead =>
// Group reaches unexpected state. Let the joining member reset their generation and rejoin.
warn(s"Attempt to add rejoining member $memberId of group ${group.groupId} in " +
s"unexpected group state ${group.currentState}")
responseCallback(JoinGroupResult(memberId, Errors.UNKNOWN_MEMBER_ID))
}
}
}
}
}
到底什么时候会走addMemberAndRebalance,什么时候会走updateMemberAndRebalance呢
1、addMemberAndRebalance:
- 在pendingMembers中,通过前面的代码分析可知在没有memberId且没有groupInstanceId的情况下,会生成memberId并返回,在那个时候就会将生成的memberId
放入pendingMembers中,表示待加入的member。也就是说在这种情况下,客户端第二次带上memberId请求服务端的时候会走走addMemberAndRebalance的逻辑。 - 或者不需要memberId(为了兼容老版本的客户端)
2、updateMemberAndRebalance
- 在group状态为PreparingRebalance时,即触发了joinGroup,在等待其他组成员加入的状态,这里会直接调用updateMemberAndRebalance
- 在group状态为CompletingRebalance时,即组成员已全部加入,并选举出consumeLeader
,等待同步分配方案的状态,这时会匹配传入的协议信息,如果不匹配则调用updateMemberAndRebalance,会再次触发rebalance - 在group状态为Stable时,即完成了重分配,处于稳定态,如果是consumeleader或者不匹配协议信息,则调用updateMemberAndRebalance
kafka.coordinator.group.GroupCoordinator#addMemberAndRebalance
这里主要是多一步,会选举消费端leader,代码也很简单,校验如果leaderId为空则取当前的member为leader,然后调用maybePrepareRebalance
private def addMemberAndRebalance(rebalanceTimeoutMs: Int,
sessionTimeoutMs: Int,
memberId: String,
groupInstanceId: Option[String],
clientId: String,
clientHost: String,
protocolType: String,
protocols: List[(String, Array[Byte])],
group: GroupMetadata,
callback: JoinCallback): Unit = {
val member = new MemberMetadata(memberId, group.groupId, groupInstanceId,
clientId, clientHost, rebalanceTimeoutMs,
sessionTimeoutMs, protocolType, protocols)
member.isNew = true
info(s"group.generationId:${group.generationId}")
// update the newMemberAdded flag to indicate that the join group can be further delayed
if (group.is(PreparingRebalance) && group.generationId == 0)
group.newMemberAdded = true
//会选举leader
group.add(member, callback)
completeAndScheduleNextExpiration(group, member, NewMemberJoinTimeoutMs)
if (member.isStaticMember) {
info(s"Adding new static member $groupInstanceId to group ${group.groupId} with member id $memberId.")
group.addStaticMember(groupInstanceId, memberId)
} else {
group.removePendingMember(memberId)
}
maybePrepareRebalance(group, s"Adding new member $memberId with group instance id $groupInstanceId")
}
kafka.coordinator.group.GroupCoordinator#updateMemberAndRebalance
可以看到updateMemberAndRebalance更简单,就是更新group中的member信息,然后调用maybePrepareRebalance
private def updateMemberAndRebalance(group: GroupMetadata,
member: MemberMetadata,
protocols: List[(String, Array[Byte])],
callback: JoinCallback): Unit = {
group.updateMember(member, protocols, callback)
maybePrepareRebalance(group, s"Updating metadata for member ${member.memberId}")
}
kafka.coordinator.group.GroupCoordinator#maybePrepareRebalance
这里也涉及到了group的状态,如果是Stable, CompletingRebalance, Empty三种状态才可调用rebalance方法
private def maybePrepareRebalance(group: GroupMetadata, reason: String): Unit = {
group.inLock {
//校验group的状态,如果是PreparingRebalance或者Dead是不允许rebalance的
if (group.canRebalance)
prepareRebalance(group, reason)
}
}
如果JoinGroupRequest请求中memberId为空,groupInstanceId不为空的情况
这里对应流程图中的分支见图六
在这种情况下依然会生成一个新的memberId,requireKnownMemberId判定会是false。
- 如果是首次请求的话,在内存中不存在groupInstanceId的记录,所以会走addMemberAndRebalance的逻辑。相对于没有groupInstanceId的消费者,会减少一次JoinGroupRequest请求。
- 如果是内存中已有的groupInstanceId的话,会走updateStaticMemberAndRebalance的逻辑,代码如下,无非也就是删除之前的member信息,然后建立新的memberId与member
的关系,这里有一点非常奇怪,可以看到在group.updateMember之后,对oldProtocols赋值,而oldProtocols的作用就是groupManager
.storeGroup报错的时候使用,根据代码注释来看应该是想回滚设置的member及协议的信息,但oldProtocols是在group
.updateMember之后赋值的,所以获取到的一直都是更新后的协议信息,不知道这里是不是kafka的bug,对此已提issue:https://issues.apache.org/jira/browse/KAFKA-13581
private def updateStaticMemberAndRebalance(group: GroupMetadata,
newMemberId: String,
groupInstanceId: Option[String],
protocols: List[(String, Array[Byte])],
responseCallback: JoinCallback): Unit = {
//获取原memberId
val oldMemberId = group.getStaticMemberId(groupInstanceId)
val currentLeader = group.leaderOrNull
//将新memberId更新到内存
val member = group.replaceGroupInstance(oldMemberId, newMemberId, groupInstanceId)
completeAndScheduleNextHeartbeatExpiration(group, member)
val knownStaticMember = group.get(newMemberId)
group.updateMember(knownStaticMember, protocols, responseCallback)
val oldProtocols = knownStaticMember.supportedProtocols
group.currentState match {
case Stable =>
// check if group's selectedProtocol of next generation will change, if not, simply store group to persist the
// updated static member, if yes, rebalance should be triggered to let the group's assignment and selectProtocol consistent
val selectedProtocolOfNextGeneration = group.selectProtocol
if (group.protocolName.contains(selectedProtocolOfNextGeneration)) {
info(s"Static member which joins during Stable stage and doesn't affect selectProtocol will not trigger rebalance.")
val groupAssignment: Map[String, Array[Byte]] = group.allMemberMetadata.map(member => member.memberId -> member.assignment).toMap
groupManager.storeGroup(group, groupAssignment, error => {
if (error != Errors.NONE) {
warn(s"Failed to persist metadata for group ${group.groupId}: ${error.message}")
// Failed to persist member.id of the given static member, revert the update of the static member in the group.
group.updateMember(knownStaticMember, oldProtocols, null)
val oldMember = group.replaceGroupInstance(newMemberId, oldMemberId, groupInstanceId)
completeAndScheduleNextHeartbeatExpiration(group, oldMember)
responseCallback(JoinGroupResult(
List.empty,
memberId = JoinGroupRequest.UNKNOWN_MEMBER_ID,
generationId = group.generationId,
protocolType = group.protocolType,
protocolName = group.protocolName,
leaderId = currentLeader,
error = error
))
} else {
group.maybeInvokeJoinCallback(member, JoinGroupResult(
members = List.empty,
memberId = newMemberId,
generationId = group.generationId,
protocolType = group.protocolType,
protocolName = group.protocolName,
// We want to avoid current leader performing trivial assignment while the group
// is in stable stage, because the new assignment in leader's next sync call
// won't be broadcast by a stable group. This could be guaranteed by
// always returning the old leader id so that the current leader won't assume itself
// as a leader based on the returned message, since the new member.id won't match
// returned leader id, therefore no assignment will be performed.
leaderId = currentLeader,
error = Errors.NONE))
}
})
} else {
maybePrepareRebalance(group, s"Group's selectedProtocol will change because static member ${member.memberId} with instance id $groupInstanceId joined with change of protocol")
}
case CompletingRebalance =>
// if the group is in after-sync stage, upon getting a new join-group of a known static member
// we should still trigger a new rebalance, since the old member may already be sent to the leader
// for assignment, and hence when the assignment gets back there would be a mismatch of the old member id
// with the new replaced member id. As a result the new member id would not get any assignment.
prepareRebalance(group, s"Updating metadata for static member ${member.memberId} with instance id $groupInstanceId")
case Empty | Dead =>
throw new IllegalStateException(s"Group ${group.groupId} was not supposed to be " +
s"in the state ${group.currentState} when the unknown static member $groupInstanceId rejoins.")
case PreparingRebalance =>
}
}
如果JoinGroupRequest请求中memberId不为空,groupInstanceId不为空的情况
在这种情况下表示带有groupInstanceId的消费者心跳失败又重新加入组,与第二种情况一样,会根据状态判断是否需要调用updateMemberAndRebalance
总结
在这篇文章里我们只是对joinGroup有了个大概的了解,主要分为以下几点
- 如果JoinGroupRequest请求中不存在memberId,会生成一个新的memberId,如果groupInstanceId为空,则会立即返回,让客户端带着memberId再请求一次
- group也有状态转换
- 服务端针对客户端leader的选举很简单,就是判断leaderId为空的话就取当前memberId
- kafka针对groupInstanceId的处理就是在内存中增加了与memberId映射关系,在无memberId加入组时会减少一次JoinGroupRequest的请求。