Clicky

How Can the Packet Size Be Greater than the MTU?

By Kary | text

Aug 15

So you’ve got a problem and you decide to fire up Wireshark and take a capture. When you look at the packets you see a bunch of them that are far larger than the 1500 byte MTU.

LargeFrame

HOW CAN THIS BE?!?!?

There’s something you need to know about taking captures on the host that is sending data. Let’s say you’re uploading some data to a server while capturing packets on your machine. You look at the capture and see something like this:

TSO

Clearly these large packets exceeding the MTU must be part of the problem, right? Probably not. Here’s why.

Many operating systems and NIC drivers support TCP Segmentation Offload (TSO) aka Large Segment Offload (LSO) aka Generic Segment Offload (GSO). What this means is that the TCP stack sends a chunk of data for the NIC to break up into Maximum Segment Size (MSS) pieces to send on the network. TCP might hand the NIC 16k of data and the NIC will break it into MSS sized bites: 11 segments of 1460 bytes and one segment of the remaining 324 bytes. This offloads the task to the NIC and saves overhead on the host’s resources. It’s a performance thing.

Here’s the kicker: Wireshark uses libpcap or winpcap to grab the data before it gets handed to the NIC.

Check it out:

libpcap-trpy

So you don’t see the actual packets that are put on the wire unless you capture outside the sending host with a tap or span port. This is one of several reasons it’s a good idea to capture traffic outside of the hosts involved in the connection whenever possible.

Here’s what the data looks like captured on the sender and then arriving at the receiver after it has been segmented:

HostA-HostB

This behavior makes TCP sequence number analysis a pain in the ass. If you’re a network troubleshooter using packet analysis, you’ve GOT to be comfortable doing sequence number analysis.

I saw someone post on reddit the other day asking about sequence number interpretation in tcpdump output. The most upvoted comment said that they had been looking at tcpdump output for 15 years and that they had never had to calculate sequence numbers.

WHUT?

I mean, what have you been doing for 15 years, son?

Anyway.

There’s another side to it that I recently saw for the first time. Large Receive Offload (LRO) or Receive Segment Coalescing (RSC). The is the same thing but in reverse. The NIC coalesces TCP segments it receives from a remote host into larger packets before sending them up to the TCP stack. Again, by offloading this to the NIC, it’s a performance enhancement but a pain in my ass.

Check out this capture taken on the client. Notice that this large frame is coming from the server and there’s no way it could have traversed a WAN without fragmentation, so it must be LRO.

RSC

One time, I got annoyed so much at this behavior that I wrote a perl script to break large packets in a capture file into MSS sized packets just to make sequence number analysis easier. I don’t know if anyone is interested in that, but I could post it up if y’all wanted. Of course, if you plan ahead you could just disable segmentation offloading before taking the capture.

So next time you take captures on a host sending and receiving traffic, do not be alarmed if you see Really Big Packets™.

Share this post! Spread the packet gospel!

Facebooktwittergoogle_plusredditlinkedinmail
Follow

About the Author

I like being the hero. Being able to drop a bucket of root cause analysis on a burning network problem has made me a hero (to some people) and it feels real good, y’all. Get good at packet analysis and be the hero too. I also like french fries.

Leave a Comment:

(10) comments

Derek August 18, 2014

Thanks Kary! Another great insight! Oh hey, I recommended packetbomb to a guy on reddit in /r/networking who was looking for some help with a file server performance issue. Hope you don’t mind. Thanks again!

Reply
    Kary August 18, 2014

    Thanks Derek! Yes, I’m always on the look out for interesting case studies.

    Reply
jack August 19, 2014

Nice Kary. New learning for me :)

Reply
Jasper Bongertz August 19, 2014

Nice post, especially pinpointing where the packets are picked up and why that is too soon to be exact.

I wrote a blog post that covered the same topic, at http://blog.packet-foo.com/2014/05/the-drawbacks-of-local-packet-captures/

In my humble opinion captures should never be taken on client or server unless you can live with the drawbacks and are aware of them. So I would not complain about LSO or LRO, CRC errors etc. if doing local captures, because that’s just what happens if it is done that way.

Also, I would never write a script to break up packets into MSS sizes. When the source (local capture) is already “artificial” it can only get worse by assuming things that may not have happened that way on the wire. E.g. you can only guess the timings etc. But again, if you can live with the drawbacks, go ahead :-)

Cheers,
Jasper

P.S.: there are tons of guys out there that think they know all about TCP, but give them one simple sequence to track and they fail every single time.

Reply
    Kary August 19, 2014

    Thanks, Jasper. I dig your site. It’s fantastic.

    For the particular issue I was troubleshooting, breaking it up into separate packets was ok, but you’re right, in general not a very good idea.

    Listen up, people! When I talk about the packet pros, Jasper is one of them. I’ve seen him present at Sharkfest and he knows his stuff. Make sure you subscribe to his site!

    Reply
Mauricio October 22, 2014

Hi Kary,

Nice text! I think that it is possible to disable TCPoffload in the NIC.

Regards,

Mauricio.

Reply
    fuzik May 23, 2016

    Cool post!
    It’s really i was looking for in my trouble.
    And answer to the question for disable it in Linux:
    ethtool -K vlan563 tso off

    To disable it on NIC “vlan563”.

    And if you want to see actual status of the TCP offload for the vlan563:
    ethtool -k vlan563

    Reply
krishna August 31, 2015

Nice article to help clear doubts. It would be great if the pkt dumps mentioned above are attached somewhere so that user can themself see how packets under TSO. One thing not clear to me is when TSO is done, are the original TCP options being copied to all the segments or some changes are done in the same.. I am trying to understand how TSO and MPTCP co-exist (if they).

Thanks again.
Krishna

Reply
Ernest November 21, 2016

Hello Kary

Thanks for posting, very useful information. I have looked at a handful of Wireshark traces now and have seen ‘TCP Segment of reassembled PDU’ by the way what does PDU stand for Physical Data Unit?

So basically are you saying is if this offload behaviour is in action, it is impossible to deduce any thing sensible from the TCP sequence/acknowledgement numbers in he normal fashion? or am I misunderstanding that point?

Yes, I would be very interested in the Perl script please (I will likely turn it into a PowerShell script as I am working on Windows)

One last question please

Lets say I have to capture on a Windows Server (as the Cisco guys will not setup a span port for me. I then turn off, offloading on the Windows NIC, If the host at the other end of the connection (storage appliance for example) has offloading enabled, will I also have issues with Seq/Ack numbers.

Thanks very much
Ernest

Reply
Add Your Reply

Leave a Comment: