问题现象

某些前端发来的请求会在前端加密发送到网关,并在网关解密之后发到真正的微服务,并将结果加密返回给前端。
实现网关加密后,发现一次加密请求后,紧接着的非加密GET请求,就会出现400的错误。再发一次相同的GET请求,就会正常,观察后端微服务的收到网关请求的accessLog,发现接收到的请求解析有问题:

## 400的请求
- - - [04/Jan/2018:19:48:30 +0800] "-" 400 - 0 0.000 - "-" null null 10.120.242.152
## 正常的请求
- - - [04/Jan/2018:19:50:18 +0800] "GET /v1/api/XXX HTTP/1.1" 200 156 11 0.011 http://www.xxx.com "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36" http-nio-8111-exec-28 10.120.242.151 10.120.242.152

问题定位

首先查看那次400请求的HTTP抓包,发现HTTP包结构是完整的:

19:48:30.224244 52:54:00:32:c5:5e > 52:54:00:66:bc:63, ethertype IPv4 (0x0800), length 1762: (tos 0x0, ttl 64, id 50111, offset 0, flags [DF], proto TCP (6), length 1748)
    10.120.242.152.27725 > 10.120.242.151.8111: Flags [P.], cksum 0x00e7 (incorrect -> 0xfdf0), seq 917602625:917604321, ack 2125955651, win 29, options [nop,nop,TS val 793264903 ecr 3278809206], length 1696
    0x0000:  4500 06d4 c3bf 4000 4006 7644 0a78 f298  E.....@.@.vD.x..
    0x0010:  0a78 f297 6c4d 1faf 36b1 8141 7eb7 8243  .x..lM..6..A~..C
    0x0020:  8018 001d 00e7 0000 0101 080a 2f48 4307  ............/HC.
    0x0030:  c36e a876 4745 5420 2f76 312f 6669 6e41  .n.vGET./v1/finA
    0x0040:  6363 732f 6669 6e41 6363 2f75 7365 7242  ccs/finAcc/userB
    0x0050:  616c 2f4b 4553 2048 5454 502f 312e 310d  al/KES.HTTP/1.1.
    。。。。。。
    0x06d0:  0d0a 0d0a                                ....

在Tomcat容器代码处打断点,读取出来的内容是有残缺的:

容器处于dead状态_Tomcat

前面那一段Get 和路径不见了

我们再看一下上一个加密请求的包内容:

11:03:27.703518 52:54:00:32:c5:5e > 52:54:00:66:bc:63, ethertype IPv4 (0x0800), length 1832: (tos 0x0, ttl 64, id 12872, offset 0, flags [DF], proto TCP (6), length 1818)
    10.120.242.152.15124 > 10.120.242.151.8111: Flags [P.], cksum 0x012d (incorrect -> 0xd94b), seq 84397903:84399669, ack 2813208375, win 33, options [nop,nop,TS val 777069391 ecr 3262603428], length 1766
    0x0000:  4500 071a 3248 4000 4006 0776 0a78 f298  E...2H@.@..v.x..
    0x0010:  0a78 f297 3b14 1faf 0507 cf4f a7ae 2737  .x..;......O..'7
。。。。。。
    0x06b0:  436f 6e74 656e 742d 4c65 6e67 7468 3a20  Content-Length:.
    0x06c0:  3630 0d0a 436f 6e6e 6563 7469 6f6e 3a20  108.Connection:.
    0x06d0:  4b65 6570 2d41 6c69 7665 0d0a 0d0a 7b22  Keep-Alive....{"
    0x06e0:  7068 6f6e 654e 6f22 3a22 3235 3437 3635  phoneNo":"254765
    0x06f0:  3433 3231 3030 222c 2270 6179 416d 6f75  432100","payAmou
    0x0700:  6e74 223a 3130 3030 3030 3030 2c22 7061  nt":10000000,"pa
    0x0710:  7943 6849 6422 3a31 307d                 yChId":10}

发现末尾的Content-Length不对,应该是60,而不是108.
解密前的长度是108,而解密后的长度是60。可能是这个原因,导致了下一个请求Tomcat丢失处理了。

Debug修改Content-Length为60,问题不再出现。可见就是这个原因

我们在解密修改包的时候,并没有成功修改Content-length

解决方案

1.换容器,换成Jetty问题消失,JettyNIO不会处理Content-Length字段,但是换容器对整体改动大,而且我们的场景适合Tomcat(大量的短小请求)
2.每个请求新建HttpClient连接,对于不同连接,TomcatNIO不会丢失处理,但是这样有性能损耗,不推荐。
3.改对Content-length,这个肯定是最佳方案,但是找对修改的地方确实换了一些时间,这里贴出核心原理代码:

对于Zuul网关的每次请求,都是一次Ribbon调用,Ribbon调用有上下文,里面有ContentLength这一项:

RibbonRoutingFilter.java

protected RibbonCommandContext buildCommandContext(RequestContext context) {
        HttpServletRequest request = context.getRequest();

        MultiValueMap<String, String> headers = this.helper
                .buildZuulRequestHeaders(request);
        MultiValueMap<String, String> params = this.helper
                .buildZuulRequestQueryParams(request);
        String verb = getVerb(request);
        InputStream requestEntity = getRequestBody(request);
        if (request.getContentLength() < 0 && !verb.equalsIgnoreCase("GET")) {
            context.setChunkedRequestBody();
        }

        String serviceId = (String) context.get(SERVICE_ID_KEY);
        Boolean retryable = (Boolean) context.get(RETRYABLE_KEY);

        String uri = this.helper.buildZuulRequestURI(request);

        // remove double slashes
        uri = uri.replace("//", "/");

        long contentLength = useServlet31 ? request.getContentLengthLong(): request.getContentLength();

        return new RibbonCommandContext(serviceId, verb, uri, retryable, headers, params,
                requestEntity, this.requestCustomizers, contentLength);
    }

注意到long contentLength = useServlet31 ? request.getContentLengthLong(): request.getContentLength();这个方法,对于Tomcat,request就是org.apache.catalina.connector.Request这个类:

@Override
public long getContentLengthLong() {
    return coyoteRequest.getContentLengthLong();
}

@Override
public int getContentLength() {
    return coyoteRequest.getContentLength();
}

再进一步看coyoteRequest的相关方法:

public int getContentLength() {
    long length = getContentLengthLong();

    if (length < Integer.MAX_VALUE) {
        return (int) length;
    }
    return -1;
}

public long getContentLengthLong() {
    if( contentLength > -1 ) {
        return contentLength;
    }

    MessageBytes clB = headers.getUniqueValue("content-length");
    contentLength = (clB == null || clB.isNull()) ? -1 : clB.getLong();

    return contentLength;
}

所以,我们在解密完包之后,对于Tomcat需要修改ContentLength,修改方式就是添加如下代码到你解密使用的Wrapper或者Filter中:

//Only for tomcat, fix content-length or there will be bugs
if (request instanceof com.netflix.zuul.http.HttpServletRequestWrapper) {
    com.netflix.zuul.http.HttpServletRequestWrapper request1 = (com.netflix.zuul.http.HttpServletRequestWrapper) request;
    RequestFacade requestFacade = (RequestFacade) request1.getRequest();
    try {
        Field field = RequestFacade.class.getDeclaredField("request");
        field.setAccessible(true);
        Request o = (Request) field.get(requestFacade);
        //将Content-length放进去
        o.getCoyoteRequest().setContentLength(this.contentLength);
    } catch (Exception e) {
        log.info("catch exception: ", e);
    }
}