前言:
前一篇文章中分析了事务日志的相关内容。在Zookeeper中,还有一个重要的日志就是快照日志。
快照日志本质上是Zookeeper全部节点信息的一个快照,从内存中保存在磁盘上。
1.查看快照日志
快照日志默认存储在 %ZOOKEEPER_DIR%/data/文件夹下,笔者的目录下产生了如下一个快照日志文件
同样的,这也是一个二进制文件,无法直接查看。而Zookeeper也提供了一个查看类org.apache.zookeeper.server.SnapshotFormatter。通过在main()方法中指定快照日志路径即可,笔者在查看snapshot.123c2文件时产生以下内容:
ZNode Details (count=74685):
----
/
cZxid = 0x00000000000000
ctime = Thu Jan 01 08:00:00 GMT+08:00 1970
mZxid = 0x00000000000000
mtime = Thu Jan 01 08:00:00 GMT+08:00 1970
pZxid = 0x000000000123c2
cversion = 74681
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x00000000000000
dataLength = 0
----
/hello24507
cZxid = 0x00000000004c3b
ctime = Tue Oct 05 17:15:30 GMT+08:00 2021
mZxid = 0x00000000004c3b
mtime = Tue Oct 05 17:15:30 GMT+08:00 2021
pZxid = 0x00000000004c3b
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x00000000000000
dataLength = 10
----
/hello24508
cZxid = 0x00000000004c3c
ctime = Tue Oct 05 17:15:30 GMT+08:00 2021
mZxid = 0x00000000004c3c
mtime = Tue Oct 05 17:15:30 GMT+08:00 2021
pZxid = 0x00000000004c3c
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x00000000000000
dataLength = 10
----
...
可以看到以上都是每个节点的基本信息(当然没有把节点value值展示出来)。
2.快照日志的生成入口
快照日志是在哪里生成的呢?在之前事务日志查看与分析中,我们有过分析,就是在SyncRequestProcessor 中生成的。代码如下
public class SyncRequestProcessor extends ZooKeeperCriticalThread implements RequestProcessor {
// 默认为100000,后续会用到
private static int snapCount = ZooKeeperServer.getSnapCount();
// 后续会被设置
private static int randRoll;
public void run() {
try {
int logCount = 0;
// 设置randRoll值为一个不定值
setRandRoll(r.nextInt(snapCount/2));
while (true) {
Request si = null;
if (toFlush.isEmpty()) {
si = queuedRequests.take();
} else {
si = queuedRequests.poll();
if (si == null) {
flush(toFlush);
continue;
}
}
if (si == requestOfDeath) {
break;
}
if (si != null) {
// track the number of records written to the log
if (zks.getZKDatabase().append(si)) {
logCount++;
// 添加完事务日志后,判断总共添加的事务日志数是否大于snapCount / 2 + randRoll,snapCount默认为100000,
// 也就是说至少执行50000+次事务操作才会生成一次快照日志
if (logCount > (snapCount / 2 + randRoll)) {
setRandRoll(r.nextInt(snapCount/2));
// roll the log
zks.getZKDatabase().rollLog();
// take a snapshot
if (snapInProcess != null && snapInProcess.isAlive()) {
LOG.warn("Too busy to snap, skipping");
} else {
snapInProcess = new ZooKeeperThread("Snapshot Thread") {
public void run() {
try {
// 启动一个子线程单独执行快照日志生成
zks.takeSnapshot();
} catch(Exception e) {
LOG.warn("Unexpected exception", e);
}
}
};
snapInProcess.start();
}
logCount = 0;
}
} else if (toFlush.isEmpty()) {
// optimization for read heavy workloads
// iff this is a read, and there are no pending
// flushes (writes), then just pass this to the next
// processor
if (nextProcessor != null) {
nextProcessor.processRequest(si);
if (nextProcessor instanceof Flushable) {
((Flushable)nextProcessor).flush();
}
}
continue;
}
toFlush.add(si);
if (toFlush.size() > 1000) {
flush(toFlush);
}
}
}
} catch (Throwable t) {
handleException(this.getName(), t);
running = false;
}
LOG.info("SyncRequestProcessor exited!");
}
}
可以看到,快照日志生成的入口就是SyncRequestProcessor,单独启动线程来完成日志生成(我们可以自定义snapCount,笔者测试的时候就是重新设置该值,不然很难看到snapshot log的生成)。
3.ZookeeperServer.takeSnapshot() 生成快照日志
public class ZooKeeperServer implements SessionExpirer, ServerStats.Provider {
public void takeSnapshot(){
try {
// 直接调用FileTxnSnapLog.save,具体见3.1
txnLogFactory.save(zkDb.getDataTree(), zkDb.getSessionWithTimeOuts());
} catch (IOException e) {
LOG.error("Severe unrecoverable error, exiting", e);
// This is a severe error that we cannot recover from,
// so we need to exit
System.exit(10);
}
}
}
3.1 FileTxnSnapLog.save()
public class FileTxnSnapLog {
public void save(DataTree dataTree,
ConcurrentHashMap<Long, Integer> sessionsWithTimeouts)
throws IOException {
long lastZxid = dataTree.lastProcessedZxid;
// 获取最新一次zxid,以此生成一个文件名
File snapshotFile = new File(snapDir, Util.makeSnapshotName(lastZxid));
LOG.info("Snapshotting: 0x{} to {}", Long.toHexString(lastZxid),
snapshotFile);
// 序列化dataTree内存信息
snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile);
}
}
3.2 FileSnap.serialize() 真正的序列化操作
public class FileSnap implements SnapShot {
public synchronized void serialize(DataTree dt, Map<Long, Integer> sessions, File snapShot)
throws IOException {
if (!close) {
OutputStream sessOS = new BufferedOutputStream(new FileOutputStream(snapShot));
CheckedOutputStream crcOut = new CheckedOutputStream(sessOS, new Adler32());
//CheckedOutputStream cout = new CheckedOutputStream()
OutputArchive oa = BinaryOutputArchive.getArchive(crcOut);
// 同样是先创建文件头
FileHeader header = new FileHeader(SNAP_MAGIC, VERSION, dbId);
// 序列化DataTree,具体见3.2.1
serialize(dt,sessions,oa, header);
// 写入checksum值
long val = crcOut.getChecksum().getValue();
oa.writeLong(val, "val");
oa.writeString("/", "path");
sessOS.flush();
crcOut.close();
sessOS.close();
}
}
}
3.2.1 SerializeUtils.serializeSnapshot()
public class SerializeUtils {
public static void serializeSnapshot(DataTree dt,OutputArchive oa,
Map<Long, Integer> sessions) throws IOException {
HashMap<Long, Integer> sessSnap = new HashMap<Long, Integer>(sessions);
// 先将sessionId --> timeout值写入
oa.writeInt(sessSnap.size(), "count");
for (Entry<Long, Integer> entry : sessSnap.entrySet()) {
oa.writeLong(entry.getKey().longValue(), "id");
oa.writeInt(entry.getValue().intValue(), "timeout");
}
// 调用DataTree的序列化方法
dt.serialize(oa, "tree");
}
}
3.2.2 DataTree.serialize() 序列化DataTree内容
public class DataTree {
public void serialize(OutputArchive oa, String tag) throws IOException {
scount = 0;
aclCache.serialize(oa);
serializeNode(oa, new StringBuilder(""));
// / marks end of stream
// we need to check if clear had been called in between the snapshot.
if (root != null) {
oa.writeString("/", "path");
}
}
}
有关于DataTree序列化的具体细节就不再详述了,读者可以自行查看。主要就是将DataTree中的所有节点逐个序列化到文件中,包括节点的path value acl等相关信息。
总结:
流程并不算复杂,相对事务日志而言,不需要填充数字,而是在使用的时候一次性的将数据写入到磁盘上。
DataTree算是Zookeeper的一个核心知识点,在启动时会将文件中节点的信息反序列化到DataTree中,在内存中可以更快的响应客户端的请求,每次事务操作也都会对DataTree进行实时操作,下一篇文章中会详述DataTree的相关知识点。