前言:

前一篇文章中分析了事务日志的相关内容。在Zookeeper中,还有一个重要的日志就是快照日志。

快照日志本质上是Zookeeper全部节点信息的一个快照,从内存中保存在磁盘上。

1.查看快照日志

快照日志默认存储在 %ZOOKEEPER_DIR%/data/文件夹下,笔者的目录下产生了如下一个快照日志文件

同样的,这也是一个二进制文件,无法直接查看。而Zookeeper也提供了一个查看类org.apache.zookeeper.server.SnapshotFormatter。通过在main()方法中指定快照日志路径即可,笔者在查看snapshot.123c2文件时产生以下内容:

ZNode Details (count=74685):
----
/
  cZxid = 0x00000000000000
  ctime = Thu Jan 01 08:00:00 GMT+08:00 1970
  mZxid = 0x00000000000000
  mtime = Thu Jan 01 08:00:00 GMT+08:00 1970
  pZxid = 0x000000000123c2
  cversion = 74681
  dataVersion = 0
  aclVersion = 0
  ephemeralOwner = 0x00000000000000
  dataLength = 0
----
/hello24507
  cZxid = 0x00000000004c3b
  ctime = Tue Oct 05 17:15:30 GMT+08:00 2021
  mZxid = 0x00000000004c3b
  mtime = Tue Oct 05 17:15:30 GMT+08:00 2021
  pZxid = 0x00000000004c3b
  cversion = 0
  dataVersion = 0
  aclVersion = 0
  ephemeralOwner = 0x00000000000000
  dataLength = 10
----
/hello24508
  cZxid = 0x00000000004c3c
  ctime = Tue Oct 05 17:15:30 GMT+08:00 2021
  mZxid = 0x00000000004c3c
  mtime = Tue Oct 05 17:15:30 GMT+08:00 2021
  pZxid = 0x00000000004c3c
  cversion = 0
  dataVersion = 0
  aclVersion = 0
  ephemeralOwner = 0x00000000000000
  dataLength = 10
----
...

可以看到以上都是每个节点的基本信息(当然没有把节点value值展示出来)。

2.快照日志的生成入口

快照日志是在哪里生成的呢?在之前事务日志查看与分析中,我们有过分析,就是在SyncRequestProcessor 中生成的。代码如下

public class SyncRequestProcessor extends ZooKeeperCriticalThread implements RequestProcessor {
 
    // 默认为100000,后续会用到
    private static int snapCount = ZooKeeperServer.getSnapCount();
    
    // 后续会被设置
    private static int randRoll;
    
    public void run() {
        try {
            int logCount = 0;

            // 设置randRoll值为一个不定值
            setRandRoll(r.nextInt(snapCount/2));
            while (true) {
                Request si = null;
                if (toFlush.isEmpty()) {
                    si = queuedRequests.take();
                } else {
                    si = queuedRequests.poll();
                    if (si == null) {
                        flush(toFlush);
                        continue;
                    }
                }
                if (si == requestOfDeath) {
                    break;
                }
                if (si != null) {
                    // track the number of records written to the log
                    if (zks.getZKDatabase().append(si)) {
                        logCount++;
                        // 添加完事务日志后,判断总共添加的事务日志数是否大于snapCount / 2 + randRoll,snapCount默认为100000,
                        // 也就是说至少执行50000+次事务操作才会生成一次快照日志
                        if (logCount > (snapCount / 2 + randRoll)) {
                            setRandRoll(r.nextInt(snapCount/2));
                            // roll the log
                            zks.getZKDatabase().rollLog();
                            // take a snapshot
                            if (snapInProcess != null && snapInProcess.isAlive()) {
                                LOG.warn("Too busy to snap, skipping");
                            } else {
                                snapInProcess = new ZooKeeperThread("Snapshot Thread") {
                                        public void run() {
                                            try {
                                                // 启动一个子线程单独执行快照日志生成
                                                zks.takeSnapshot();
                                            } catch(Exception e) {
                                                LOG.warn("Unexpected exception", e);
                                            }
                                        }
                                    };
                                snapInProcess.start();
                            }
                            logCount = 0;
                        }
                    } else if (toFlush.isEmpty()) {
                        // optimization for read heavy workloads
                        // iff this is a read, and there are no pending
                        // flushes (writes), then just pass this to the next
                        // processor
                        if (nextProcessor != null) {
                            nextProcessor.processRequest(si);
                            if (nextProcessor instanceof Flushable) {
                                ((Flushable)nextProcessor).flush();
                            }
                        }
                        continue;
                    }
                    toFlush.add(si);
                    if (toFlush.size() > 1000) {
                        flush(toFlush);
                    }
                }
            }
        } catch (Throwable t) {
            handleException(this.getName(), t);
            running = false;
        }
        LOG.info("SyncRequestProcessor exited!");
    }
}

可以看到,快照日志生成的入口就是SyncRequestProcessor,单独启动线程来完成日志生成(我们可以自定义snapCount,笔者测试的时候就是重新设置该值,不然很难看到snapshot log的生成)。

3.ZookeeperServer.takeSnapshot() 生成快照日志

public class ZooKeeperServer implements SessionExpirer, ServerStats.Provider {
	public void takeSnapshot(){

        try {
            // 直接调用FileTxnSnapLog.save,具体见3.1
            txnLogFactory.save(zkDb.getDataTree(), zkDb.getSessionWithTimeOuts());
        } catch (IOException e) {
            LOG.error("Severe unrecoverable error, exiting", e);
            // This is a severe error that we cannot recover from,
            // so we need to exit
            System.exit(10);
        }
    }
}

3.1 FileTxnSnapLog.save()

public class FileTxnSnapLog {
	public void save(DataTree dataTree,
            ConcurrentHashMap<Long, Integer> sessionsWithTimeouts)
        throws IOException {
        long lastZxid = dataTree.lastProcessedZxid;
        // 获取最新一次zxid,以此生成一个文件名
        File snapshotFile = new File(snapDir, Util.makeSnapshotName(lastZxid));
        LOG.info("Snapshotting: 0x{} to {}", Long.toHexString(lastZxid),
                snapshotFile);
        // 序列化dataTree内存信息
        snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile);
        
    }
}

3.2 FileSnap.serialize() 真正的序列化操作

public class FileSnap implements SnapShot {
	public synchronized void serialize(DataTree dt, Map<Long, Integer> sessions, File snapShot)
            throws IOException {
        if (!close) {
            OutputStream sessOS = new BufferedOutputStream(new FileOutputStream(snapShot));
            CheckedOutputStream crcOut = new CheckedOutputStream(sessOS, new Adler32());
            //CheckedOutputStream cout = new CheckedOutputStream()
            OutputArchive oa = BinaryOutputArchive.getArchive(crcOut);
            // 同样是先创建文件头
            FileHeader header = new FileHeader(SNAP_MAGIC, VERSION, dbId);
            // 序列化DataTree,具体见3.2.1 
            serialize(dt,sessions,oa, header);
            // 写入checksum值
            long val = crcOut.getChecksum().getValue();
            oa.writeLong(val, "val");
            oa.writeString("/", "path");
            sessOS.flush();
            crcOut.close();
            sessOS.close();
        }
    }
}

3.2.1 SerializeUtils.serializeSnapshot()

public class SerializeUtils {
	public static void serializeSnapshot(DataTree dt,OutputArchive oa,
            Map<Long, Integer> sessions) throws IOException {
        HashMap<Long, Integer> sessSnap = new HashMap<Long, Integer>(sessions);
        // 先将sessionId --> timeout值写入
        oa.writeInt(sessSnap.size(), "count");
        for (Entry<Long, Integer> entry : sessSnap.entrySet()) {
            oa.writeLong(entry.getKey().longValue(), "id");
            oa.writeInt(entry.getValue().intValue(), "timeout");
        }
        
        // 调用DataTree的序列化方法
        dt.serialize(oa, "tree");
    }
}

3.2.2 DataTree.serialize() 序列化DataTree内容

public class DataTree {
	public void serialize(OutputArchive oa, String tag) throws IOException {
        scount = 0;
        aclCache.serialize(oa);
        serializeNode(oa, new StringBuilder(""));
        // / marks end of stream
        // we need to check if clear had been called in between the snapshot.
        if (root != null) {
            oa.writeString("/", "path");
        }
    }
}

有关于DataTree序列化的具体细节就不再详述了,读者可以自行查看。主要就是将DataTree中的所有节点逐个序列化到文件中,包括节点的path value acl等相关信息。

总结:

流程并不算复杂,相对事务日志而言,不需要填充数字,而是在使用的时候一次性的将数据写入到磁盘上。

DataTree算是Zookeeper的一个核心知识点,在启动时会将文件中节点的信息反序列化到DataTree中,在内存中可以更快的响应客户端的请求,每次事务操作也都会对DataTree进行实时操作,下一篇文章中会详述DataTree的相关知识点。