前言:

上文对Leader节点处理非事务请求有过分析之后,本文就来看下真正的重头戏:事务请求的处理过程。

当然,有关于Leader的processor还是那些:PrepRequestProcessor -> ProposalRequestProcessor -> CommitProcessor -> ToBeAppliedRequestProcessor -> FinalRequestProcessor。我们直接来分析下其处理过程。

事务请求有很多,笔者挑一个比较典型的:create请求,其他类型请求都比较类似,不再赘述。

1.PrepRequestProcessor

public class PrepRequestProcessor extends ZooKeeperCriticalThread implements RequestProcessor {
    protected void pRequest(Request request) throws RequestProcessorException {
        request.hdr = null;
        request.txn = null;
        
        try {
            switch (request.type) {
                case OpCode.create:
                CreateRequest createRequest = new CreateRequest();
                pRequest2Txn(request.type, zks.getNextZxid(), request, createRequest, true);
                break;
                ...    
            }
        }
        // 最终交由下一个processor处理
        request.zxid = zks.getZxid();
        nextProcessor.processRequest(request);
    }
    
    // 具体处理在这里
    protected void pRequest2Txn(int type, long zxid, Request request, Record record, boolean deserialize)
        throws KeeperException, IOException, RequestProcessorException
    {
        request.hdr = new TxnHeader(request.sessionId, request.cxid, zxid,
                                    Time.currentWallTime(), type);

        switch (type) {
            case OpCode.create:                
                zks.sessionTracker.checkSession(request.sessionId, request.getOwner());
                CreateRequest createRequest = (CreateRequest)record;   
                if(deserialize)
                    // 将客户端的请求体反序列化到CreateRequest对象中
                    ByteBufferInputStream.byteBuffer2Record(request.request, createRequest);
                // path检查
                String path = createRequest.getPath();
                int lastSlash = path.lastIndexOf('/');
                if (lastSlash == -1 || path.indexOf('\0') != -1 || failCreate) {
                    LOG.info("Invalid path " + path + " with session 0x" +
                            Long.toHexString(request.sessionId));
                    throw new KeeperException.BadArgumentsException(path);
                }
                // ACL权限检查
                List<ACL> listACL = removeDuplicates(createRequest.getAcl());
                if (!fixupACL(request.authInfo, listACL)) {
                    throw new KeeperException.InvalidACLException(path);
                }
                String parentPath = path.substring(0, lastSlash);
                ChangeRecord parentRecord = getRecordForPath(parentPath);

                checkACL(zks, parentRecord.acl, ZooDefs.Perms.CREATE,
                        request.authInfo);
                int parentCVersion = parentRecord.stat.getCversion();
                // 根据创建节点类型,重置path信息
                CreateMode createMode =
                    CreateMode.fromFlag(createRequest.getFlags());
                if (createMode.isSequential()) {
                    path = path + String.format(Locale.ENGLISH, "%010d", parentCVersion);
                }
                validatePath(path, request.sessionId);
                try {
                    if (getRecordForPath(path) != null) {
                        throw new KeeperException.NodeExistsException(path);
                    }
                } catch (KeeperException.NoNodeException e) {
                    // ignore this one
                }
                // 检查父节点是否临时节点
                boolean ephemeralParent = parentRecord.stat.getEphemeralOwner() != 0;
                if (ephemeralParent) {
                    throw new KeeperException.NoChildrenForEphemeralsException(path);
                }
                int newCversion = parentRecord.stat.getCversion()+1;
                
                // 补充request的txn对象信息,后续requestProcessor会用到
                request.txn = new CreateTxn(path, createRequest.getData(),
                        listACL,
                        createMode.isEphemeral(), newCversion);
                StatPersisted s = new StatPersisted();
                if (createMode.isEphemeral()) {
                    s.setEphemeralOwner(request.sessionId);
                }
                // 修改父节点的stat信息
                parentRecord = parentRecord.duplicate(request.hdr.getZxid());
                parentRecord.childCount++;
                parentRecord.stat.setCversion(newCversion);
                addChangeRecord(parentRecord);
                addChangeRecord(new ChangeRecord(request.hdr.getZxid(), path, s,
                        0, listACL));
                break;
        }
        ...
        
}

在这里的处理与之前分析单机版的节点处理没有任何区别,主要就是对权限ACL、路径等的校验,后续交由ProposalRequestProcessor 处理

2.ProposalRequestProcessor 

public class ProposalRequestProcessor implements RequestProcessor {
 
    public void processRequest(Request request) throws RequestProcessorException {
        // 如果请求来自leaner
        if(request instanceof LearnerSyncRequest){
            zks.getLeader().processSync((LearnerSyncRequest)request);
        } else {
            	// 事务和非事务请求都会将该请求流转到下一个processor(CommitProcessor ),
                nextProcessor.processRequest(request);
            // 而针对事务请求的话(事务请求头不为空),则还需要进行事务投票等动作,在这里与之前非事务请求有所不同
            if (request.hdr != null) {
                try {
                    // 针对事务请求发起一次propose,具体在2.1
                    zks.getLeader().propose(request);
                } catch (XidRolloverException e) {
                    throw new RequestProcessorException(e.getMessage(), e);
                }
                // 将本次事务请求记录到事务日志中去,之前有过SyncProcessor的分析,这里不再赘述
                syncProcessor.processRequest(request);
            }
        }
    }
}

2.1 Leader针对事务请求发起propose

public class Leader {
 
    public Proposal propose(Request request) throws XidRolloverException {
        // 可以关注下这个bug
        if ((request.zxid & 0xffffffffL) == 0xffffffffL) {
            String msg =
                    "zxid lower 32 bits have rolled over, forcing re-election, and therefore new epoch start";
            shutdown(msg);
            throw new XidRolloverException(msg);
        }
        byte[] data = SerializeUtils.serializeRequest(request);
        proposalStats.setLastProposalSize(data.length);
        // 封装一个PROPOSAL类型的packet
        QuorumPacket pp = new QuorumPacket(Leader.PROPOSAL, request.zxid, data, null);
        
        Proposal p = new Proposal();
        p.packet = pp;
        p.request = request;
        synchronized (this) {
            if (LOG.isDebugEnabled()) {
                LOG.debug("Proposing:: " + request);
            }

            lastProposed = p.packet.getZxid();
            outstandingProposals.put(lastProposed, p);
            // 最终将proposal包发送到followers中
            sendPacket(pp);
        }
        return p;
    }
    
    // 发送proposal到所有的follower中去
    void sendPacket(QuorumPacket qp) {
        synchronized (forwardingFollowers) {
            for (LearnerHandler f : forwardingFollowers) { 
                // 最终交由每个LearnerHandler来处理
                f.queuePacket(qp);
            }
        }
    }
}

总结:有关于ProposalRequestProcessor的处理,一方面将请求交由下一个processor(CommitProcessor)来处理,另一方面将请求包装为proposal发送给所有的follower,等待follower处理完成返回ack;

2.2 Leader发送proposal到followers

Leader将请求包装为proposal,最终交由LearnerHandler来发送。发送就是正常的发送即可,我们来看下接收follower的响应(ack)的相关逻辑

public class LearnerHandler extends ZooKeeperThread {
    @Override
    public void run() {
     	...
        while (true) {
            qp = new QuorumPacket();
            ia.readRecord(qp, "packet");
         	ByteBuffer bb;
            long sessionId;
            int cxid;
            int type;

            // 接收到响应
            switch (qp.getType()) {
                // ACK类型,说明follower已经完成该次请求事务日志的记录    
                case Leader.ACK:
                    if (this.learnerType == LearnerType.OBSERVER) {
                        if (LOG.isDebugEnabled()) {
                            LOG.debug("Received ACK from Observer  " + this.sid);
                        }
                    }
                    syncLimitCheck.updateAck(qp.getZxid());
                    // leader计算是否已经有足够的follower返回ack
                    leader.processAck(this.sid, qp.getZxid(), sock.getLocalSocketAddress());
                    break;   
                    ...
            }
        }
    }
}

2.3 leader收集follower关于本次proposal的投票

public class Leader {
	synchronized public void processAck(long sid, long zxid, SocketAddress followerAddr) {
        ...
        Proposal p = outstandingProposals.get(zxid);
        if (p == null) {
            LOG.warn("Trying to commit future proposal: zxid 0x{} from {}",
                    Long.toHexString(zxid), followerAddr);
            return;
        }
        
        // 当当前响应ack的follower的sid添加到Proposal的ackSet中
        p.ackSet.add(sid);
       
        // 是否已经有足够的follower返回ack
        if (self.getQuorumVerifier().containsQuorum(p.ackSet)){             
            if (zxid != lastCommitted+1) {
                LOG.warn("Commiting zxid 0x{} from {} not first!",
                        Long.toHexString(zxid), followerAddr);
                LOG.warn("First is 0x{}", Long.toHexString(lastCommitted + 1));
            }
            outstandingProposals.remove(zxid);
            // 本次proposal已经被多数follower通过,可以进行commit
            // 先添加到toBeApplied中
            if (p.request != null) {
                toBeApplied.add(p);
            }

            if (p.request == null) {
                LOG.warn("Going to commmit null request for proposal: {}", p);
            }
            // leader向所有的follower发送commit命令,以提交本次proposal
            commit(zxid);
            inform(p);
            
            // 将本次请求添加到CommitProcessor.committedRequests集合中
            zk.commitProcessor.commit(p.request);
            if(pendingSyncs.containsKey(zxid)){
                for(LearnerSyncRequest r: pendingSyncs.remove(zxid)) {
                    sendSync(r);
                }
            }
        }
    }
}

总结:整个proposal投票的过程主要分为以下几个步骤:

1)leader针对事务请求发起投票,生成proposal,发送给所有的follower

2)follower接收proposal,处理完成后,返回ack给leader

3)leader收集所有的ack,如果多数follower已经返回ack,则判定本次请求通过,可以进行提交

4)leader向所有的follower发送commit请求,follower提交该proposal

3.CommitProcessor 

既然主要的事情都让ProposalRequestProcessor 做了,那CommitProcessor还做什么呢?

leader到目前为止只是将事务请求记录到事务日志,但是并没有添加到当前ZKDatabase中,那什么时候添加呢?最终会交由FinalRequestProcessor来添加。那添加的时机是什么时候呢?这个由CommitProcessor来把握 ,其主要作用在此。

public class CommitProcessor extends ZooKeeperCriticalThread implements RequestProcessor {
 
    // leader获取的请求集合
    LinkedList<Request> queuedRequests = new LinkedList<Request>();
    // 已经被follower 提交的请求集合
    LinkedList<Request> committedRequests = new LinkedList<Request>();
    
    public void run() {
        try {
            Request nextPending = null;            
            while (!finished) {
                int len = toProcess.size();
                for (int i = 0; i < len; i++) {
                    // 5.请求proposal已完成,交由下个processor处理即可
                    nextProcessor.processRequest(toProcess.get(i));
                }
                toProcess.clear();
                synchronized (this) {
                    // 2.若没有收到足够的follower ack,则等待
                    if ((queuedRequests.size() == 0 || nextPending != null)
                            && committedRequests.size() == 0) {
                        wait();
                        continue;
                    }
                    // 3.committedRequests不为空,说明已经收到足够的follower ack,follower已经commit本次请求
                    if ((queuedRequests.size() == 0 || nextPending != null)
                            && committedRequests.size() > 0) {
                        Request r = committedRequests.remove();
                        if (nextPending != null
                                && nextPending.sessionId == r.sessionId
                                && nextPending.cxid == r.cxid) {
                            nextPending.hdr = r.hdr;
                            nextPending.txn = r.txn;
                            nextPending.zxid = r.zxid;
                            // 4.则针对leader而言,本次请求可以提交给下个processor处理
                            toProcess.add(nextPending);
                            nextPending = null;
                        } else {
                            // this request came from someone else so just
                            // send the commit packet
                            toProcess.add(r);
                        }
                    }
                }

                // We haven't matched the pending requests, so go back to
                // waiting
                if (nextPending != null) {
                    continue;
                }

                // 1.请求达到时,nextPending被设置为当前request,下次循环时会使用到
                synchronized (this) {
                    // Process the next requests in the queuedRequests
                    while (nextPending == null && queuedRequests.size() > 0) {
                        Request request = queuedRequests.remove();
                        switch (request.type) {
                        case OpCode.create:
                        case OpCode.delete:
                        case OpCode.setData:
                        case OpCode.multi:
                        case OpCode.setACL:
                        case OpCode.createSession:
                        case OpCode.closeSession:
                            nextPending = request;
                            break;
                        case OpCode.sync:
                            if (matchSyncs) {
                                nextPending = request;
                            } else {
                                toProcess.add(request);
                            }
                            break;
                        default:
                            toProcess.add(request);
                        }
                    }
                }
            }
        } catch (InterruptedException e) {
            LOG.warn("Interrupted exception while waiting", e);
        } catch (Throwable e) {
            LOG.error("Unexpected exception causing CommitProcessor to exit", e);
        }
        LOG.info("CommitProcessor exited loop!");
    }
}

读者可以按照方法中标注的数字顺序来看代码,这样整个流程就顺了。

4.ToBeAppliedRequestProcessor 

static class ToBeAppliedRequestProcessor implements RequestProcessor {
    private RequestProcessor next;
    private ConcurrentLinkedQueue<Proposal> toBeApplied;

    public void processRequest(Request request) throws RequestProcessorException {
        // request.addRQRec(">tobe");
        next.processRequest(request);
        Proposal p = toBeApplied.peek();
        if (p != null && p.request != null
            && p.request.zxid == request.zxid) {
            toBeApplied.remove();
        }
    }
}

代码非常简单,貌似ToBeAppliedRequestProcessor拦截到了一个寂寞,基本啥也没做,直接交由最后一个processor处理了

5.FinalRequestProcessor

public class FinalRequestProcessor implements RequestProcessor {
 
    public void processRequest(Request request) {
     	ProcessTxnResult rc = null;
        synchronized (zks.outstandingChanges) {
            ...
            if (request.hdr != null) {
               TxnHeader hdr = request.hdr;
               Record txn = request.txn;

                // 真正的创建该节点,添加到ZKDatabase中
               rc = zks.processTxn(hdr, txn);
            }
            // 以上都完成后,将本次事务请求放入committedProposal队列中
            if (Request.isQuorum(request.type)) {
                zks.getZKDatabase().addCommittedProposal(request);
            }
        }
        switch (request.type) {
            // 针对create请求,返回CreateResponse响应即可
            case OpCode.create: {
                lastOp = "CREA";
                rsp = new CreateResponse(rc.path);
                err = Code.get(rc.err);
                break;
            }    
        }
    }
}

所以最终创建节点的动作还是由FinalRequestProcessor来完成,不做多分析,跟之前单机版处理过程类似。

总结:

leader节点处理一次事务请求的过程还是蛮复杂的,主要过程在于针对事务请求的proposal投票及收集投票响应(ack)的过程,这是相对于非事务请求的不同点。

还是借用<<从Paxos到Zookeeper 分布式一致性原理与实践>> 的一张图来总结下整个过程:

Zookeeper事务id zookeeper处理事务型请求_zookeeper