java 心跳检测机制 netty心跳检测

转载

mob64ca1402a190 2024-02-23 11:46:14

文章标签 java 心跳检测机制 java rpc 开发语言定时任务 文章分类 Java 后端开发

一、Netty连接的有效性检测

Netty 作为一个网络框架，提供了诸多功能，比如编码解码等，Netty 还提供了非常重要的一个服务-----心跳机制(heartbeat)。通过心跳检查对方是否有效，这是 RPC 框架中必不可少的功能。下面我们分析一下Netty内部心跳服务的实现。

java 心跳检测机制 netty心跳检测_java 心跳检测机制

Netty 提供了 IdleStateHandler、ReadTimeoutHandler和WriteTimeoutHandler 三个检测连接的有效性的 Handler ，此处重点分析 IdleStateHandler。

名称	作用
IdleStateHandler	当连接的空闲时间(读或者写)太长时，将会触发一个IdleStateEvent事件。用户可以通过重写ChannelInboundHandler中的userEventTrigger方法来处理该事件。
ReadTimeoutHandler	如果在指定的时间没有发生读事件，就会抛出一个ReadTimeoutException异常，并自动关闭这个连接。用户可以在exceptionCaught方法中处理这个异常。
WriteTimeoutHandler	当一个写操作不能在一定的时间内完成时，抛出一个WriteTimeoutException异常，并关闭连接。用户同样可以在exceptionCaught方法中处理这个异常。

ReadTimeout和WriteTimeout事件都会自动关闭连接，且都会抛出异常，需要进行异常处理。

二、IdleStateHandler分析

IdleStateHandler类中有4个属性：

//是否考虑出站较慢的情况，默认false
private final boolean observeOutput;
//读事件空闲时间，为0则禁用读事件的心跳检测
private final long readerIdleTimeNanos;
//写事件空闲时间，为0则禁用写事件的心跳检测
private final long writerIdleTimeNanos;
//读或写空闲时间，0则禁用读和写事件的心跳检测
private final long allIdleTimeNanos;

当IdleStateHandler被添加到pipeline时，会调用IdleStateHandler的initialize方法：

@Override
public void handlerAdded(ChannelHandlerContext ctx) throws Exception {
    if (ctx.channel().isActive() && ctx.channel().isRegistered()) {
        // channelActive() event has been fired already, which means this.channelActive() will
        // not be invoked. We have to initialize here instead.
        initialize(ctx);
    } else {
        // channelActive() event has not been fired yet.  this.channelActive() will be invoked
        // and initialization will occur there.
    }
}

除了IdleStateHandler被添加时会调用下面的initialize方法外，在调用IdleStateHandler.channelActive()和IdleStateHandler.channelRegistered()时也会调用IdleStateHandler的initialize方法：

private void initialize(ChannelHandlerContext ctx) {
    // Avoid the case where destroy() is called before scheduling timeouts.
    // See: https://github.com/netty/netty/issues/143
    switch (state) {
    case 1:
    case 2:
        return;
    }

    state = 1;
    initOutputChanged(ctx);

    lastReadTime = lastWriteTime = ticksInNanos();
    if (readerIdleTimeNanos > 0) {
    	//此处的schedule方法会调用eventloop的schedule方法，将定时任务添加到队列中
        readerIdleTimeout = schedule(ctx, new ReaderIdleTimeoutTask(ctx),
                readerIdleTimeNanos, TimeUnit.NANOSECONDS);
    }
    if (writerIdleTimeNanos > 0) {
        writerIdleTimeout = schedule(ctx, new WriterIdleTimeoutTask(ctx),
                writerIdleTimeNanos, TimeUnit.NANOSECONDS);
    }
    if (allIdleTimeNanos > 0) {
        allIdleTimeout = schedule(ctx, new AllIdleTimeoutTask(ctx),
                allIdleTimeNanos, TimeUnit.NANOSECONDS);
    }
}

只要某个给定的参数(即上面提到的那4个属性中的后3个)大于0，就会创建对应的一个定时任务，如果有多个属性的值大于0，则会创建多个对应的定时任务。同时会将state的状态设置为1，防止重复初始化。这个方法中又调用了initOutputChanged()，初始化“监控出站数据属性”：

private void initOutputChanged(ChannelHandlerContext ctx) {
    if (observeOutput) {
        Channel channel = ctx.channel();
        Unsafe unsafe = channel.unsafe();
        ChannelOutboundBuffer buf = unsafe.outboundBuffer();

        if (buf != null) {
            lastMessageHashCode = System.identityHashCode(buf.current());
            lastPendingWriteBytes = buf.totalPendingWriteBytes();
        }
    }
}

IdleStateHandler类中定义了三个定时任务相关的内部类，即initialize()方法中提到的对应的定时任务类型：

java 心跳检测机制 netty心跳检测_java_02

这3个定时任务分别对应读、写、读或者写三个事件，这3个类都继承自另一个IdleStateHandler的内部类AbstractIdleTask，在该类型提供了一个抽象的模板方法run()，这也是这3个子类需要实现的方法：

private abstract static class AbstractIdleTask implements Runnable {

    private final ChannelHandlerContext ctx;

    AbstractIdleTask(ChannelHandlerContext ctx) {
        this.ctx = ctx;
    }

    @Override
    public void run() {
        if (!ctx.channel().isOpen()) {
            return;
        }
		//调用下面的抽象方法
        run(ctx);
    }
    //该方法留给子类实现，不同的子类其实就是对该方法的实现不同而已
    protected abstract void run(ChannelHandlerContext ctx);
}

ReaderIdleTimeoutTask的run()：

private final class ReaderIdleTimeoutTask extends AbstractIdleTask {

    ReaderIdleTimeoutTask(ChannelHandlerContext ctx) {
        super(ctx);
    }

    @Override
    protected void run(ChannelHandlerContext ctx) {
        long nextDelay = readerIdleTimeNanos;
        if (!reading) {
            nextDelay -= ticksInNanos() - lastReadTime;
        }

        if (nextDelay <= 0) {
            // Reader is idle - set a new timeout and notify the callback.
            // 用于取消任务promise
            readerIdleTimeout = schedule(ctx, this, readerIdleTimeNanos, TimeUnit.NANOSECONDS);

            boolean first = firstReaderIdleEvent;
            firstReaderIdleEvent = false;

            try {
                //再次提交任务
                IdleStateEvent event = newIdleStateEvent(IdleState.READER_IDLE, first);
                //触发userEventTrigger
                channelIdle(ctx, event);
            } catch (Throwable t) {
                ctx.fireExceptionCaught(t);
            }
        } else {
            // Read occurred before the timeout - set a new timeout with shorter delay.
            readerIdleTimeout = schedule(ctx, this, nextDelay, TimeUnit.NANOSECONDS);
        }
    }
}

逻辑大致如下：
①获取用户设置的读超时时间
②如果当前没有读事件发生(即读取操作结束了，执行了channelReadComplete方法，该方法会为lastReadTime赋值)，则取当前时间和最后一次读事件的事件的差，看其差值是否大于用户设置的读超时时间，如果大于就触发读超时事件，否则继续监测
③读超时事件触发的逻辑：首先将任务再次放到队列，超时时间是刚开始设置的时间，返回一个promise对象，用于做取消操作。然后，设置first属性为false，表示下一次不再是第一次了，这个属性在channelRead方法会被改成true
④创建一个IdleStateEvent类型的写事件对象，将此对象传递给用户的UserEventTriggered方法，完成触发事件的操作
总的来说，每次读取操作都会记录一个时间，定时任务时间到了，会计算当前时间和最后一次读的时间的间隔，如果这个间隔超过了设置的时间，就触发UserEventTriggered方法。
WriterIdleTimeoutTask的run()：

private final class WriterIdleTimeoutTask extends AbstractIdleTask {

     WriterIdleTimeoutTask(ChannelHandlerContext ctx) {
         super(ctx);
     }

     @Override
     protected void run(ChannelHandlerContext ctx) {

         long lastWriteTime = IdleStateHandler.this.lastWriteTime;
         long nextDelay = writerIdleTimeNanos - (ticksInNanos() - lastWriteTime);
         if (nextDelay <= 0) {
             // Writer is idle - set a new timeout and notify the callback.
             writerIdleTimeout = schedule(ctx, this, writerIdleTimeNanos, TimeUnit.NANOSECONDS);

             boolean first = firstWriterIdleEvent;
             firstWriterIdleEvent = false;

             try {
                 if (hasOutputChanged(ctx, first)) {
                     return;
                 }

                 IdleStateEvent event = newIdleStateEvent(IdleState.WRITER_IDLE, first);
                 channelIdle(ctx, event);
             } catch (Throwable t) {
                 ctx.fireExceptionCaught(t);
             }
         } else {
             // Write occurred before the timeout - set a new timeout with shorter delay.
             writerIdleTimeout = schedule(ctx, this, nextDelay, TimeUnit.NANOSECONDS);
         }
     }
 }

写事件的run代码逻辑基本和读事件的一样，唯一不同的就是有一个针对出站较曼数据的判断：调用了hasOutputChanged()。
AllIdleTimeoutTask 的run()：

private final class AllIdleTimeoutTask extends AbstractIdleTask {

    AllIdleTimeoutTask(ChannelHandlerContext ctx) {
        super(ctx);
    }

    @Override
    protected void run(ChannelHandlerContext ctx) {

        long nextDelay = allIdleTimeNanos;
        if (!reading) {
            nextDelay -= ticksInNanos() - Math.max(lastReadTime, lastWriteTime);
        }
        if (nextDelay <= 0) {
            // Both reader and writer are idle - set a new timeout and
            // notify the callback.
            allIdleTimeout = schedule(ctx, this, allIdleTimeNanos, TimeUnit.NANOSECONDS);

            boolean first = firstAllIdleEvent;
            firstAllIdleEvent = false;

            try {
                //判断是否有写的慢的情况
                if (hasOutputChanged(ctx, first)) {
                    return;
                }

                IdleStateEvent event = newIdleStateEvent(IdleState.ALL_IDLE, first);
                channelIdle(ctx, event);
            } catch (Throwable t) {
                ctx.fireExceptionCaught(t);
            }
        } else {
            // Either read or write occurred before the timeout - set a new
            // timeout with shorter delay.
            allIdleTimeout = schedule(ctx, this, nextDelay, TimeUnit.NANOSECONDS);
        }
    }
}

和单独的读写超时不同的时，此处计算超时时间的方式稍有不同，是看最后的读或者写事件的时间：

// 看读和写哪个时间更近，也就是读或者写的最后时间
nextDelay -= ticksInNanos() - Math.max(lastReadTime, lastWriteTime);

hasOutputChanged()方法的代码如下：

private boolean hasOutputChanged(ChannelHandlerContext ctx, boolean first) {
    if (observeOutput) {

        // We can take this shortcut if the ChannelPromises that got passed into write()
        // appear to complete. It indicates "change" on message level and we simply assume
        // that there's change happening on byte level. If the user doesn't observe channel
        // writability events then they'll eventually OOME and there's clearly a different
        // problem and idleness is least of their concerns.
        if (lastChangeCheckTimeStamp != lastWriteTime) {
            lastChangeCheckTimeStamp = lastWriteTime;

            // But this applies only if it's the non-first call.
            if (!first) {
                return true;
            }
        }

        Channel channel = ctx.channel();
        Unsafe unsafe = channel.unsafe();
        ChannelOutboundBuffer buf = unsafe.outboundBuffer();

        if (buf != null) {
            int messageHashCode = System.identityHashCode(buf.current());
            long pendingWriteBytes = buf.totalPendingWriteBytes();

            if (messageHashCode != lastMessageHashCode || pendingWriteBytes != lastPendingWriteBytes) {
                lastMessageHashCode = messageHashCode;
                lastPendingWriteBytes = pendingWriteBytes;

                if (!first) {
                    return true;
                }
            }
        }
    }

    return false;
}

三、Netty的心跳机制总结

①IdleStateHandler可以实现心跳功能，当服务器和客户端在超过指定的时间没有任何读写交互时，就会触发用户自定义的Handler的userEventTriggered()方法(该方法需要用户自己重写)，用户可以在这个方法中尝试向对方发送消息，如果发送失败，则关闭连接。

public class ChannelInboundHandlerAdapter extends ChannelHandlerAdapter implements ChannelInboundHandler {
	...
    @Override
    public void userEventTriggered(ChannelHandlerContext ctx, Object evt) throws Exception {
        ctx.fireUserEventTriggered(evt);
    }	
    ...
}

②IdleStateHandler的实现基于EventLoop的定时任务，每次读写都会记录一个读写的最后时间，在定时任务运行的时候，通过计算当前时间和最后一次读写的时间的时间间隔是否大于创建IdleStateHandler时设置的时间，以此来判断连接是否空闲。
③IdleStateHandler内部有3个定时任务，分别对应读事件、写事件、读写事件，通常用户监听读写事件就足够了。
④IdleStateHandler也考虑了一些极端情况：客户端接收缓慢，依次接收数据的时间超过了设置的空闲时间，Netty通过构造方法中的observeOutput属性来决定是否对出站缓冲区的情况进行判断。

private boolean hasOutputChanged(ChannelHandlerContext ctx, boolean first) {
     if (observeOutput) {
         if (lastChangeCheckTimeStamp != lastWriteTime) {
             lastChangeCheckTimeStamp = lastWriteTime;

             if (!first) {
                 return true;
             }
         }

         Channel channel = ctx.channel();
         Unsafe unsafe = channel.unsafe();
         ChannelOutboundBuffer buf = unsafe.outboundBuffer();

         if (buf != null) {
             int messageHashCode = System.identityHashCode(buf.current());
             long pendingWriteBytes = buf.totalPendingWriteBytes();

             if (messageHashCode != lastMessageHashCode || pendingWriteBytes != lastPendingWriteBytes) {
                 lastMessageHashCode = messageHashCode;
                 lastPendingWriteBytes = pendingWriteBytes;

                 if (!first) {
                     return true;
                 }
             }
         }
     }

     return false;
 }

⑤如果出站缓慢，Netty不认为这是空闲，也就不触发空闲事件。但第一次无论如何也是要触发的，因为第一次无法判断是出站缓慢还是空闲。出站缓慢还可能造成OOM，OOM比空闲的问题更大。
⑥当应用出现内存溢出(OOM之类)，并且写空闲极少发生(observeOutput为true)，就需要注意是不是数据出站速度过慢。
⑦ReadTimeoutHandler继承自IdleStateHandler，当触发读空闲事件的时候，就会触发ctx.fireExceptionCaught()方法，并传入一个ReadTimeoutException的异常对象，然后关闭Socket。

protected void readTimedOut(ChannelHandlerContext ctx) throws Exception {
    if (!closed) {
        ctx.fireExceptionCaught(ReadTimeoutException.INSTANCE);
        ctx.close();
        closed = true;
    }
}

⑧WriteTimeoutHandler的实现不是基于IdleStateHandler的，它的原理是：当调用write()方法的时候，会创建一个定时任务，任务逻辑是根据传入的promise的完成情况来判断是否超出了写的时间，当定时任务根据指定时间开始运行，发现promise的isDone方法返回false，表明还没有写完，即超时了，则抛出异常；当write方法完成后，会打断定时任务。

@Override
public void write(ChannelHandlerContext ctx, Object msg, ChannelPromise promise) throws Exception {
    if (timeoutNanos > 0) {
        promise = promise.unvoid();
        scheduleTimeout(ctx, promise);
    }
    ctx.write(msg, promise);
}

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。