简析gRPC client 连接管理

背景

客户端skd 使用gRPC作为通信协议,定时(大概是120s)向服务器发送pingServer 请求。
服务端是80端口,如xxx:80.

问题

发现客户端不断的端口重连服务器的。
使用netstat -antp

gRPC长连接心跳 grpc建议长连接方式_c/c++

如图, 如标红的服务器地址连接是TIME_WAIT,后面有和服务器建立连接 ESTABLISHED。
TIME_WAIT 状态表明是client 端主动断开了连接。

这和我之前的认知有点冲突,gRPC 应该是长连接,为什么这里每次都断开呢,这样不就长了短连接了吗?
而且客户端主动断开的,会不会是client端哪里有问题?

带着疑问,在client 抓了一包,
发现client 总是受到一个 length 为17 的包,然后就开始FIN 包,走TCP 挥手的流程。
使用WireShark 对tcpdump的结果查看,发现这个length 17 的包,是一个GOAWAY 包。

如图:

gRPC长连接心跳 grpc建议长连接方式_java_02

这个是HTTP2定义的一个“优雅”退出的机制。

这里有HTTP2 GOAWAY stream 包的说明。

HTTP2 GOAWAY 说明

根据之前的对gRPC的了解,gRPC client 会解析域名,然后会维护一个lb 负载均衡,
这个应该是gRPC对idle 连接的管理。pingServer 的时间间隔是120s, 但是gRPC 认为中间是idle连接,
所以通知client 关闭空闲连接?

为了验证这个想法,修改了一下gRPC 的demo, 因为我们client 端使用是cpp 的gRPC 异步调用方式,
所以更加gRPC 的异步demo, 写了一个简单访问服务器的async_client

代码:

#include <iostream>
#include <memory>
#include <string>

#include <grpcpp/grpcpp.h>
#include <grpc/support/log.h>
#include <thread>

#include "gateway.grpc.pb.h"

using grpc::Channel;
using grpc::ClientAsyncResponseReader;
using grpc::ClientContext;
using grpc::CompletionQueue;
using grpc::Status;
using yournamespace::PingReq;
using yournamespace::PingResp;
using yournamespace::srv;

class GatewayClient {
  public:
    explicit GatewayClient(std::shared_ptr<Channel> channel)
            : stub_(srv::NewStub(channel)) {}

    // Assembles the client's payload and sends it to the server.
    //void PingServer(const std::string& user) {
    void PingServer() {
        // Data we are sending to the server.
        PingReq request;
        request.set_peerid("1111111111111113");
        request.set_clientinfo("");

        request.set_capability(1);
        request.add_iplist(4197554190);
        request.set_tcpport(8080);
        request.set_udpport(8080);
        request.set_upnpip(4197554190);
        request.set_upnpport(8080);
        request.set_connectnum(10000);
        request.set_downloadingspeed(100);
        request.set_uploadingspeed(10);
        request.set_maxdownloadspeed(0);
        request.set_maxuploadspeed(0);

        // Call object to store rpc data
        AsyncClientCall* call = new AsyncClientCall;

        // stub_->PrepareAsyncSayHello() creates an RPC object, returning
        // an instance to store in "call" but does not actually start the RPC
        // Because we are using the asynchronous API, we need to hold on to
        // the "call" instance in order to get updates on the ongoing RPC.
        call->response_reader =
            stub_->AsyncPing(&call->context, request, &cq_);

        // StartCall initiates the RPC call
        //call->response_reader->StartCall();

        // Request that, upon completion of the RPC, "reply" be updated with the
        // server's response; "status" with the indication of whether the operation
        // was successful. Tag the request with the memory address of the call object.
        call->response_reader->Finish(&call->reply, &call->status, (void*)call);

    }

    // Loop while listening for completed responses.
    // Prints out the response from the server.
    void AsyncCompleteRpc() {
        void* got_tag;
        bool ok = false;

        // Block until the next result is available in the completion queue "cq".
        while (cq_.Next(&got_tag, &ok)) {
            // The tag in this example is the memory location of the call object
            AsyncClientCall* call = static_cast<AsyncClientCall*>(got_tag);

            // Verify that the request was completed successfully. Note that "ok"
            // corresponds solely to the request for updates introduced by Finish().
            GPR_ASSERT(ok);

            if (call->status.ok())
                std::cout << "xNetClient received: " << call->reply.code() << "  task:" << call->reply.tasks_size() <<"  pinginterval:"<< call->reply.pinginterval() << std::endl;
            else
                //std::cout << "RPC failed" << std::endl;
            std::cout << ": status = " << call->status.error_code() << " (" << call->status.error_message() << ")" << std::endl;

            // Once we're complete, deallocate the call object.
            delete call;
        }
    }

  private:

    // struct for keeping state and data information
    struct AsyncClientCall {
        // Container for the data we expect from the server.
        PingResp reply;

        // Context for the client. It could be used to convey extra information to
        // the server and/or tweak certain RPC behaviors.
        ClientContext context;

        // Storage for the status of the RPC upon completion.
        Status status;


        std::unique_ptr<ClientAsyncResponseReader<PingResp>> response_reader;
    };

    // Out of the passed in Channel comes the stub, stored here, our view of the
    // server's exposed services.
    std::unique_ptr<srv::Stub> stub_;

    // The producer-consumer queue we use to communicate asynchronously with the
    // gRPC runtime.
    CompletionQueue cq_;
};

int main(int argc, char** argv) {


    // Instantiate the client. It requires a channel, out of which the actual RPCs
    // are created. This channel models a connection to an endpoint (in this case,
    // localhost at port 50051). We indicate that the channel isn't authenticated
    // (use of InsecureChannelCredentials()).

    if (argc < 2){
    std::cout << "usage: " <<argv[0]<< " domain:port" << std::endl;
    std::cout << "eg: " <<argv[0]<< " gw.xnet.xcloud.sandai.net:80" << std::endl;
    return 0;
    }

    GatewayClient xNetClient(grpc::CreateChannel( argv[1], grpc::InsecureChannelCredentials()));

    // Spawn reader thread that loops indefinitely
    std::thread thread_ = std::thread(&GatewayClient::AsyncCompleteRpc, &xNetClient);

    for (int i = 0; i < 1000; i++) {
        xNetClient.PingServer();  // The actual RPC call!
        std::this_thread::sleep_for(std::chrono::seconds(120));
    }

    std::cout << "Press control-c to quit" << std::endl << std::endl;
    thread_.join();  //blocks forever

    return 0;
}

接下来的时间很简单,运行一下。
使用netstat -natp 观察,可以重新。 async_client 也是断开,重连。
进一步调试发现,把发包的时间修改为10s 的时候,可以保持连接,大于10s基本上连接就会断开。

小结

小结一下:
gRPC 管理连接的方式,默认情况下,大于10s没有数据发送,gRPC 就会认为是个idle 连接。server 端会给client 端发送一个GOAWAY 的包。client 收到这个包之后就会主动关闭连接。下次需要发包的时候,就会重新建立连接。

目前还不知道是不是有配置项修改这个值,对gRPC 的机制还不是很熟,后面再研究一下。