1.Kafka “高吞吐” 之顺序访问与零拷贝
https://cloud.tencent.com/developer/article/1476649
2.kafka通过零拷贝实现高效的数据传输
3.Kafka的零拷贝技术
https://www.jianshu.com/p/835ec2d4c170
4.什么是“零拷贝”技术
https://baijiahao.baidu.com/s?id=1648595456047501430&wfr=spider&for=pc
Kafka在数据传输的时候,使用了零拷贝技术,这样的技术大大提升了Kafka 的吞吐率。来研究下 Kafka中的零拷贝是如何实现的。
普通的数据传输实现
许多Web应用程序都提供了大量的静态内容,这相当于从磁盘读取数据并将完全相同的数据写回到响应socket。这个活动可能似乎只需要相对较少的CPU活动,但是效率有些低下:内核从磁盘读取数据,并将其从内核用户边界推送到应用程序,然后应用程序将其推回到内核用户边界写出来的socket。实际上,应用程序作为一个低效的媒介,从磁盘文件获取数据到socket。
图示如下:
代码如下:
File.read(fileDesc, buf, len);
Socket.send(socket, buf, len);
四次。上图显示了数据如何从文件内部移动到套接字:
Kafka 的零拷贝
两次
图示如下:
在Java 的实现是通过 java.nio.channels.FileChannel 的 transfer 实现的 ,看下具体的实现。
其中FileChannel 是一个抽象类
/**
* Transfers bytes from this channel's file to the given writable byte
* channel.
*
* <p> An attempt is made to read up to <tt>count</tt> bytes starting at
* the given <tt>position</tt> in this channel's file and write them to the
* target channel. An invocation of this method may or may not transfer
* all of the requested bytes; whether or not it does so depends upon the
* natures and states of the channels. Fewer than the requested number of
* bytes are transferred if this channel's file contains fewer than
* <tt>count</tt> bytes starting at the given <tt>position</tt>, or if the
* target channel is non-blocking and it has fewer than <tt>count</tt>
* bytes free in its output buffer.
*
* <p> This method does not modify this channel's position. If the given
* position is greater than the file's current size then no bytes are
* transferred. If the target channel has a position then bytes are
* written starting at that position and then the position is incremented
* by the number of bytes written.
*
* <p> This method is potentially much more efficient than a simple loop
* that reads from this channel and writes to the target channel. Many
* operating systems can transfer bytes directly from the filesystem cache
* to the target channel without actually copying them. </p>
*
* @param position
* The position within the file at which the transfer is to begin;
* must be non-negative
*
* @param count
* The maximum number of bytes to be transferred; must be
* non-negative
*
* @param target
* The target channel
*
* @return The number of bytes, possibly zero,
* that were actually transferred
*
* @throws IllegalArgumentException
* If the preconditions on the parameters do not hold
*
* @throws NonReadableChannelException
* If this channel was not opened for reading
*
* @throws NonWritableChannelException
* If the target channel was not opened for writing
*
* @throws ClosedChannelException
* If either this channel or the target channel is closed
*
* @throws AsynchronousCloseException
* If another thread closes either channel
* while the transfer is in progress
*
* @throws ClosedByInterruptException
* If another thread interrupts the current thread while the
* transfer is in progress, thereby closing both channels and
* setting the current thread's interrupt status
*
* @throws IOException
* If some other I/O error occurs
*/
public abstract long transferTo(long position, long count,
WritableByteChannel target)
throws IOException;
This method is potentially much more efficient than a simple loop that reads from this channel and writes to the target channel. Many operating systems can transfer bytes directly from the filesystem cache to the target channel without actually copying them.
看下具体的实现类 ctrl + alt + b (idea) sun.nio.ch.FileChannelImpl
public long transferTo(long var1, long var3, WritableByteChannel var5) throws IOException {
this.ensureOpen();
if(!var5.isOpen()) {
throw new ClosedChannelException();
} else if(!this.readable) {
throw new NonReadableChannelException();
} else if(var5 instanceof FileChannelImpl && !((FileChannelImpl)var5).writable) {
throw new NonWritableChannelException();
} else if(var1 >= 0L && var3 >= 0L) {
long var6 = this.size();
if(var1 > var6) {
return 0L;
} else {
int var8 = (int)Math.min(var3, 2147483647L);
if(var6 - var1 < (long)var8) {
var8 = (int)(var6 - var1);
}
long var9;
return (var9 = this.transferToDirectly(var1, var8, var5)) >= 0L?var9:((var9 = this.transferToTrustedChannel(var1, (long)var8, var5)) >= 0L?var9:this.transferToArbitraryChannel(var1, var8, var5));
}
} else {
throw new IllegalArgumentException();
}
最后追踪到了 sun.nio.ch.FileChannelImpl 的如下方法 :
private native long transferTo0(int var1, long var2, long var4, int var6);
sendFile
其底层在Linux 中是调用了 sendFile 函数
ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count)
-
in_fd
被打开是等待读数据的fd. -
out_fd
被打开是等待写数据的fd. -
Offset
是在正式开始读取数据之前应该向前偏移的byte数. -
count
是需要在两个fd之间“搬移”的数据的byte数.
sendFile系统调用零拷贝就是避免了上下文切换带来的copy操作,同时利用直接存储器访问技术(DMA)执行IO操作,避免了内核缓冲区之前的数据拷贝操作。
总结与分析
“零拷贝技术”只用将磁盘文件的数据复制到页面缓存中一次,然后将数据从页面缓存直接发送到网络中(发送给不同的订阅者时,都可以使用同一个页面缓存),避免了重复复制操作。
如果有10个消费者,传统方式下,数据复制次数为4*10=40次,而使用“零拷贝技术”只需要1+10=11次,一次为从磁盘复制到页面缓存,10次表示10个消费者各自读取一次页面缓存。