Clicky

Solving Tomcat Throughput Issues on Windows

By Kary | case study

Apr 14

Here’s another real world example of troubleshooting performance problems. An email subscriber, Simon, contacted me about poor Tomcat throughput on Windows 2008R2 – whenever Tomcat served a large file, download speed sucked. To Simon’s credit, he eliminated “the network” by verifying that Apache and IIS did not have the same issue.

Here’s where a mediocre network engineer or admin might say “sorry, not my problem” and push it back to the application guys or server guys. Not Simon. Simon understands that no one cares if your network is fast, efficient, and well designed. People care about applications and when applications perform poorly over the network, people yell and someone has to solve it. I love it when people step up and own a problem that’s not technically theirs. That’s what top performers do.

Further to Simon’s credit, he diagnosed the issue and came up with a solution. He asked me my thoughts and if there was another possible solution.

Download the pcap from this video

My Wireshark code changes for Segments ACKed. As of Wireshark 2.2 the “bytes since last PSH bit” is included as tcp.analysis.push_bytes_sent.

A PacketBomb viewer wrote a Wireshark Lua script to do the same thing

 

Share this post! Spread the packet gospel!

Facebooktwittergoogle_plusredditlinkedinmail
Follow

About the Author

I like being the hero. Being able to drop a bucket of root cause analysis on a burning network problem has made me a hero (to some people) and it feels real good, y’all. Get good at packet analysis and be the hero too. I also like french fries.

Leave a Comment:

(13) comments

Vladimir April 17, 2015

Hi Kary,
Thanks for the video!

How could you add a filter tcp.analysis.segments_acked == xx and “Buffer bytes” column?
I don’t have it in my Wireshark.. Is it some custom programming?

BR,
Vladimir

Reply
    Kary April 17, 2015

    Vladimir,

    Yes, it was a bit of custom programming. I will try to commit it upstream to Wireshark

    Reply
uhei April 17, 2015

Great Video! Thanks for your work.

Will you commit your changes to Wireshark?
I would really appreciate this!
If not, can you provide the diff for your changes?

Thanks!

Reply
    Kary April 17, 2015

    I will do my best to get the changes committed.

    Reply
Dharmady April 18, 2015

Thanks for sharing this, really informative.

Reply
chrismarget April 20, 2015

Hey Kary,

About the delayed ACK timer…

You said the receiving TCP starts the timer on segment arrival, and kills the timer when a non-timer event (2nd segment) rolls in.

Do you know if it really works this way? I’m under the impression that it’s common for stacks to run a periodic timer (like a metronome), rather than one-shot timers for these sorts of events. The upshot of that distinction is that rather than delaying ACKs *exactly* 200ms, ACKs get delayed *up to* 200ms.

Thanks!

Reply
    Kary April 20, 2015

    Details, Chris, details.

    Yes you’re right.

    Both TCP timers, the 200- and 500-ms timers, go off at times relative to when the kernel was bootstrapped. Whenever TCP sets a timer, it can go off anywhere between 1-200 or 1-500 ms in the future.

    From TCP/IP Illustrated Vol 1

    Reply
      chrismarget April 20, 2015

      Ha!

      I’m sorry if it came across as nitpicking. I was really wondering if you knew something about the specific stack you were looking at in the example.

      I mean… It’s possible that stacks (especially offload engines) do something new since Stevens, right? :)

      Also, this got me wondering… It *feels* like TCP implementations should have per-socket delayed ACK timers, doesn’t it? Flooding the link with lots of delayed ACKs *all at the same instant* because the server’s got lots of sockets open seems to defeat the purpose.

      Dammit Kary, every time you post a video, I wind up in the lab. Why is that?

      Reply
        Kary April 20, 2015

        The stack was Windows 2008. This was 6 months ago so if I came across a source that explained Windows timers, I forgot by now. Stevens was written about BSD but it’s good enough most of the time :)

        Didn’t we (drunkenly) argue over TCP timers or some other TCP detail at Networking Field Day? Haha

        There’s a lot more that I don’t know versus I do know, so my only real job here is to inspire. So go get in that lab!

        My only complaint is you keep changing your email address and making me approve your comments. I approve of you, Chris, I approve.

        Reply
chrismarget April 20, 2015

Hey Kary,

I think the answer to “why does this only happen when using 64KB send buffer?” is here:

https://support.microsoft.com/en-us/kb/823764

It’s clear (ish) from the PSH flag that the application is performing 64KB writes, and the send window was 64KB, so it correlates nicely.

Didn’t this non-windowed buffer behavior inside the TCP from Redmond appear in the comments on one of your previous posts?

Also, I have a theory about the occasional single-segment, non-delayed ACK: Do lots of them show up at 200ms intervals? If so, they could be the result of the periodic delayed ACK timer firing between the arrival of two segments which would otherwise have produced an ACK for two segments.

Reply
    Kary April 20, 2015

    Nice find with the KB. Simon and I found it as well and we thought it was the likeliest explanation we could find, though I could repro the issue on Tomcat with blocking IO, so it doesn’t completely add up.

    I went back and read through my email on this and found this:

    I’ve reproduced the issue with two Win2k8R2 AWS instances. It’s definitely a combination of delayed ACK and a crappy send application behavior. The inconsistent delayed ACK issue seems triggered when the sending side uses 64k buffers. If I alter the tomcat config to use non-blocking IO and larger buffers (http://javaagile.blogspot.com/2010/08/tomcat-tuning.html – though I increased the send buffer to 1MB in the config), I saw zero delayed ACKS and much better throughput.

    I had the same thought about the non-delayed ACKs today. I’ll take a look through the pcap. It’s linked under the video if you wanna take a look too.

    Reply
Charlie April 20, 2015

Hi Kary, loved the video and the explanation!

Is there anyway you can show us the before and after performance of this? (From the real world case scenario)

Thank you

Reply
Adam November 4, 2016

This is great, I love the site and am surprised I hadn’t seen this video, because it’s really relevant since I’m troubleshooting a very similar issue right now with Tomcat version 6.
Someone from the wireshark thread linked it for me.
https://ask.wireshark.org/questions/56957/http-server-limiting-transfer-rate

In my case there is 220ms of latency (long distance WAN), and the server is appearing to use a 9000 Byte buffer, so it is way slower! This problem presented itself after we removed WAN acceleration.

I analyzed and told the server team that the server was causing the delay, and it was due to the server only sending 9K, then WAITING for ACK’s, though I think delayed ACK may be a secondary factor, since it is transferring an odd numbers of packets (7), then pausing:

1) 1460B
2) 1460B
3) 1460B
4) 1460B
5) 1460B
6) 1460B
7) 240B
Total 9000B
WAIT for ACK…
Repeat.
—–

The buffer that appears to be holding the transfer up is:
socketBuffer
https://tomcat.apache.org/tomcat-6.0-doc/config/http.html
“The size (in bytes) of the buffer to be provided for socket output buffering. -1 can be specified to disable the use of a buffer. By default, a buffers of 9000 bytes will be used.”

Here’s a link in Cloudshark to one of the captures I’m working with. (With IP’s and details obfuscated using tracewrangler)

https://www.cloudshark.org/captures/94b162026653

Reply
Add Your Reply

Leave a Comment: