MQ重复消费是指同一个应用的多个实例收到相同的消息,或者同一个实例收到多次相同的消息,若消费者逻辑未做幂等处理,就会造成重复消费。
消费者收到消息后,从消息中获取消息标识写入到Redis或数据库,当再次收到该消息时就不作处理。消息重复投递的场景,除重试外,很大一部分来自于负载均衡阶段,前一个监听Queue的消费实例拉取的消息未全部ack,新的消费实例监听到这个Queue重新拉取消息。
微众银行为解决负载均衡阶段重复听或漏听的问题,在负载均衡结果变化过程增加了一个过渡态,在过渡态的时候,Consumer会继续保留上一次负载均衡的结果,直到原消费者拉取的消息全部ack,才释放老的结果。
改造的实现是在RocketMQ的Broker端增加了一个ConsumeQueueAccessLockManager类,对Queue加了锁。当新的Consumer拉取消息的时候,判断一下如果该Consumer监听的Queue存在已投递但仍未收到ack且未超时的消息,就不允许获取锁,直到该Queue投递的消息全部ack或者消费超时,才允许该Consumer获取锁,拉取消息。
ConsumeQueueAccessLockManager中获取锁部分逻辑如下:
public synchronized boolean updateAccessControlTable(String group, String topic, String clientId, int queueId) {
if (group != null && topic != null && clientId != null) {
ConcurrentHashMap<String/*Topic*/, ConcurrentHashMap<Integer/*queueId*/, AccessLockEntry>> topicTable = accessLockTable.get(group);
if (topicTable == null) {
topicTable = new ConcurrentHashMap<>();
accessLockTable.put(group, topicTable);
LOG.info("group not exist, put group:{}", group);
}
ConcurrentHashMap<Integer/*queueId*/, AccessLockEntry> queueIdTable = topicTable.get(topic);
if (queueIdTable == null) {
queueIdTable = new ConcurrentHashMap<>();
topicTable.put(topic, queueIdTable);
LOG.info("topic not exist, put topic:{} into group {}", topic, group);
}
AccessLockEntry accessEntry = queueIdTable.get(queueId);
if (accessEntry == null) {
long deliverOffset = brokerController.getConsumeQueueManager().queryDeliverOffset(group, topic, queueId);
accessEntry = new AccessLockEntry(clientId, System.currentTimeMillis(), deliverOffset);
queueIdTable.put(queueId, accessEntry);
LOG.info("mq is not locked. I got it. group:{}, topic:{}, queueId:{}, newClient:{}",
group, topic, queueId, clientId);
return true;
}
//已经占有该Queue,则更新时间
if (clientId.equals(accessEntry.getClientId())) {
accessEntry.setLastAccessTimestamp(System.currentTimeMillis());
accessEntry.setLastDeliverOffset(brokerController.getConsumeQueueManager().queryDeliverOffset(group, topic, queueId));
return false;
}
//不占有该Queue,且不是wakeup的请求,才能抢锁
else {
long holdTimeThreshold = brokerController.getDeFiBusBrokerConfig().getLockQueueTimeout();
long realHoldTime = System.currentTimeMillis() - accessEntry.getLastAccessTimestamp();
boolean holdTimeout = (realHoldTime > holdTimeThreshold);
long deliverOffset = brokerController.getConsumeQueueManager().queryDeliverOffset(group, topic, queueId);
long lastDeliverOffset = accessEntry.getLastDeliverOffset();
if (deliverOffset == lastDeliverOffset) {
accessEntry.getDeliverOffsetNoChangeTimes().incrementAndGet();
} else {
accessEntry.setLastDeliverOffset(deliverOffset);
accessEntry.setDeliverOffsetNoChangeTimes(0);
}
long ackOffset = brokerController.getConsumeQueueManager().queryOffset(group, topic, queueId);
long diff = deliverOffset - ackOffset;
boolean offsetEqual = (diff == 0);
int deliverOffsetNoChangeTimes = accessEntry.getDeliverOffsetNoChangeTimes().get();
boolean deliverNoChange = (deliverOffsetNoChangeTimes >= brokerController.getDeFiBusBrokerConfig().getMaxDeliverOffsetNoChangeTimes());
if ((offsetEqual && deliverNoChange) || holdTimeout) {
LOG.info("tryLock mq, update access lock table. topic:{}, queueId:{}, newClient:{}, oldClient:{}, hold time threshold:{}, real hold time:{}, hold timeout:{}, offset equal:{}, diff:{}, deliverOffset no change:{}, deliverOffset:{}, ackOffset:{}",
topic,
queueId,
clientId,
accessEntry.getClientId(),
holdTimeThreshold,
realHoldTime,
holdTimeout,
offsetEqual,
diff,
deliverNoChange,
deliverOffset,
ackOffset);
accessEntry.setLastAccessTimestamp(System.currentTimeMillis());
accessEntry.setLastDeliverOffset(deliverOffset);
accessEntry.getDeliverOffsetNoChangeTimes().set(0);
accessEntry.setClientId(clientId);
return true;
}
LOG.info("tryLock mq, but mq locked by other client: {}, group: {}, topic: {}, queueId: {}, nowClient:{}, hold timeout:{}, offset equal:{}, deliverOffset no change times:{}", accessEntry.getClientId(),
group, topic, queueId, clientId, holdTimeout, offsetEqual, deliverOffsetNoChangeTimes);
return false;
}
}
return false;
}
DeFiPullMessageProcessor中拉取消息的逻辑如下:
@Override
public RemotingCommand processRequest(final ChannelHandlerContext ctx,
RemotingCommand request) throws RemotingCommandException {
final PullMessageRequestHeader requestHeader =
(PullMessageRequestHeader) request.decodeCommandCustomHeader(PullMessageRequestHeader.class);
ConsumerGroupInfo consumerGroupInfo = deFiBrokerController.getConsumerManager().getConsumerGroupInfo(requestHeader.getConsumerGroup());
if (deFiBrokerController.getDeFiBusBrokerConfig().getMqAccessControlEnable() == 1) {
//集群模式才进行访问表控制
if (consumerGroupInfo != null && consumerGroupInfo.getMessageModel() == MessageModel.CLUSTERING) {
ClientChannelInfo clientChannelInfo = consumerGroupInfo.getChannelInfoTable().get(ctx.channel());
if (clientChannelInfo != null) {
String group = consumerGroupInfo.getGroupName();
String topic = requestHeader.getTopic();
int queueId = requestHeader.getQueueId();
String clientId = clientChannelInfo.getClientId();
boolean acquired = deFiBrokerController.getMqAccessLockManager().updateAccessControlTable(group, topic, clientId, queueId);
boolean isAllowed = deFiBrokerController.getMqAccessLockManager().isAccessAllowed(group,topic,clientId,queueId);
//不是分给自己的Queue,返回空
if (!isAllowed) {
RemotingCommand response = RemotingCommand.createResponseCommand(PullMessageResponseHeader.class);
final PullMessageResponseHeader responseHeader = (PullMessageResponseHeader) response.readCustomHeader();
LOG.info("pull message rejected. queue is locked by other client. group:{}, topic:{}, queueId:{}, queueOffset:{}, request clientId:{}",
requestHeader.getConsumerGroup(), requestHeader.getTopic(), requestHeader.getQueueId(), requestHeader.getQueueOffset(), clientId);
responseHeader.setMinOffset(deFiBrokerController.getMessageStore().getMinOffsetInQueue(requestHeader.getTopic(), requestHeader.getQueueId()));
responseHeader.setMaxOffset(deFiBrokerController.getMessageStore().getMaxOffsetInQueue(requestHeader.getTopic(), requestHeader.getQueueId()));
responseHeader.setNextBeginOffset(requestHeader.getQueueOffset());
responseHeader.setSuggestWhichBrokerId(MixAll.MASTER_ID);
response.setCode(ResponseCode.PULL_NOT_FOUND);
response.setRemark("mq is locked by other client.");
return response;
}
//分到一个Q之后,更新offset为最新的ackOffset,避免消息重复
if (acquired) {
long nextBeginOffset = correctRequestOffset(group, topic, queueId, requestHeader.getQueueOffset());
if (nextBeginOffset != requestHeader.getQueueOffset().longValue()) {
RemotingCommand response = RemotingCommand.createResponseCommand(PullMessageResponseHeader.class);
final PullMessageResponseHeader responseHeader = (PullMessageResponseHeader) response.readCustomHeader();
response.setOpaque(request.getOpaque());
responseHeader.setMinOffset(deFiBrokerController.getMessageStore().getMinOffsetInQueue(requestHeader.getTopic(), requestHeader.getQueueId()));
responseHeader.setMaxOffset(deFiBrokerController.getMessageStore().getMaxOffsetInQueue(requestHeader.getTopic(), requestHeader.getQueueId()));
responseHeader.setNextBeginOffset(nextBeginOffset);
responseHeader.setSuggestWhichBrokerId(MixAll.MASTER_ID);
response.setCode(ResponseCode.PULL_NOT_FOUND);
response.setRemark("lock a queue success, update pull offset.");
LOG.info("update pull offset from [{}] to [{}] after client acquire a queue. clientId:{}, queueId:{}, topic:{}, group:{}",
requestHeader.getQueueOffset(), nextBeginOffset, clientId, requestHeader.getQueueId(),
requestHeader.getTopic(), requestHeader.getConsumerGroup());
return response;
}
else {
LOG.info("no need to update pull offset. clientId:{}, queueId:{}, topic:{}, group:{}, request offset: {}",
clientId, requestHeader.getQueueId(), requestHeader.getTopic(), requestHeader.getConsumerGroup(), requestHeader.getQueueOffset());
}
}
}
}
}
//...
return response;
}
在大消息量的场景下,在Broker端做的这些改造,能有效减少无意义的重复投递,对节省网络资源等有很大意义,即使这个改造,会影响一点服务端性能,但整体权衡利远大于弊。这个特性也有很强的通用性,完全适用于其它项目。话说回来,虽然在Broker端做了很大改造,但在重试等场景下,仍可能造成消息重复投递,消费者端还是要做好消费的幂等处理。
--本文引入的源码均来自于微众开源项目DeFiBus