Sub-Component - Coyote OS : Redhat Linux Overview: Embedded Tomcat version 9.0.39 is implemented to transport http/2 packets between 2 systems (h2c connection). Http2 packets of payload size 55KB sent from the client are processed asynchronously . The content-type of request/response is application/json. For few requests Tomcat sends GOAWAY with reason that seems irrelevant . 1)GOAWAY with PROTOCOL ERROR : The content length header value [55294] does not agree with the size of the data received [55295] Where, both the content length header value and the size of the data received is 55295 bytes 2)GOAWAY with FRAME_SIZE_ERROR : The payload is [6386210] bytes long but the max frame size is [16384] Where , the payload size is only around 55 KB Please find attached the PCAPs for reference. We have tried and reproduced these exceptions with a sample code implementing embedded tomcat version 9.0.39 with payload size of around 55KB . Please find below the steps to reproduce. I have attached the sample code / input JMX file to reproduce the issue. Steps to Reproduce: With JMeter as simulation client, on configuring 700 threads (700 connections) to connect towards Tomcat Server 9.0.39 embedded in our system and on sending 20 requests per second with payload of 55KB (same request with just one json value sent uniquely via the Random number generator)and on letting the test run in an infinite loop , few requests are getting timed out . On analysing the PCAP for the particular request we see that tomcat sends GOAWAY with PROTOCOL ERROR / FRAME_SIZE_ERROR where the reason seems to be incorrect. Client: JMeter 5.3 with additional HTTP2 sampler. No of threads: 700 Ramp-up period:10 seconds Loop:Infinite Payload size: around 55KB Constant Throughput Timer added to limit the tps to 20. Random Variable Generator added to the JSON request to uniquely identify for which request the exception is printed and to map it in the PCAP collected . Response Timeout : 5000 ms SERVER: Sample application attached . MaxThreads configured in tomcat is 200(all other parameters are the tomcat defaults). The input requests are processed asynchronously with 40 threads. Build: Embedded tomcat 9.0.39 Resolution/Clarification requested: Can you please explain the root cause for this . Thanks in advance.
Created attachment 37513 [details] FRAME_SIZE_ERROR PCAP
Created attachment 37514 [details] PROTOCOL ERROR
Created attachment 37515 [details] JMX file to reproduce issue
Created attachment 37516 [details] Sample application to reproduce the issue
Any update on this please !
Pinging for updates on BZs is not allowed. This is not the first time you are doing this.
*** Bug 64829 has been marked as a duplicate of this bug. ***
*** Bug 64828 has been marked as a duplicate of this bug. ***
What hardware are you running this on? I've been running the test case locally for 15+ mins and I haven't seen a single error reported. Details are hardware used, memory allocated to Tomcat and JMeter, network between client and server would likely be helpful. If we can't recreate the issue, we can't debug it.
Moving to NEEDINFO as more information about the provided test case is required to reproduce the issue.
Mark , In production we see the issue reported in this bug and bug 64828 on Linux. I had reproduced the issue with the attached source code on my local windows machine (Windows 10 - 64 bit / 16 GB RAM). JMeter version :5.3 with Xmx:5g Embedded tomcat application :Default Xmx Thanks in advance !!
Thanks. I can try testing on a similar spec machine (although it will be a VM). How long does the test case have to be running before you start to see errors?
Thanks Mark . Within 15 to 20 minutes of running the case I was able to see the errors , Please let us know if you need more information .
Success. This was trivial to recreate on Windows. Must be a timing thing. Fix on the way.
Fixed in: - 10.0.x for 10.0.0-M10 onwards - 9.0.x for 9.0.40 onwards - 8.5.x for 8.5.60 onwards
Thank you soo much Mark for the fix .. One clarification please, Were you able to reproduce and fix the payload related issue reported in bug 64828 as well ? Thanks in advance !
I didn't see any payload errors reported. The other errors occurred after a few seconds on Windows so I fixed those then ran a longer test (20 mins) where no errors where observed. I don't immediately see how the issues I fixed could led to the payload errors described. It would be informative if you were to test the latest 9.0.x code and report back.
Hi Mark , I tried running the same test program with the latest tomcat 9.0.41 jars on windows machine . 1.)I am still able to see the incomplete payload related exceptions and GOAWAY (PROTOCOL_ERROR and FRAME_SIZE_ERROR) . The test (payload size 55KB) ran for about 8 minutes where few requests had timed-out (could see in JMeter GUI). On filtering the requests in the PCAP captured with the unique identifier , I was able to find the trace for 3 requests with the below reason for GOAWAY (GOAWAY.pcap attached) *) FRAME_SIZE_ERROR:The payload is [2128653] bytes long but the maximum frame size is [16384] *) PROTOCOL_ERROR :Connection [7004], Stream [1], The content length header value [56,465] does not agree with the size of the data received [56,466] *) PROTOCOL_ERROR :Connection [7092], Stream [1], The content length header value [56,466] does not agree with the size of the data received [56,466] Specs: Windows 10 Processor:Intel(R) Core(TM) i5-8350U CPU @ 1.70GHz 1.90GHz RAM:16 GB System Type : 64 bit Please let me know in case of further inputs. Thanks in advance!
Created attachment 37600 [details] PCAP with Goaway
Looks like there is still an issue related to the HPACK decoder. I can see a debug log where the decoder reports a content-length that is not consistent with the value shown in the Wireshark trace. Still trying to figure out how this is happening.
Found it. It wasn't in the HPACK decoder at all. I've applied a patch and I can no longer get the test case to fail. Could you retest with a new 9.0.x build?
I am going to assume that the recent commit fixed this. If this is not the case please do re-open but I think we'd need a new test case in that case as I can no longer trigger any errors with this one.