How Can the Packet Size Be Greater than the MTU? – PacketBomb

How Can the Packet Size Be Greater than the MTU?

By Kary | text

Aug 15

So you’ve got a problem and you decide to fire up Wireshark and take a capture. When you look at the packets you see a bunch of them that are far larger than the 1500 byte MTU.

LargeFrame

HOW CAN THIS BE?!?!?

There’s something you need to know about taking captures on the host that is sending data. Let’s say you’re uploading some data to a server while capturing packets on your machine. You look at the capture and see something like this:

TSO

Clearly these large packets exceeding the MTU must be part of the problem, right? Probably not. Here’s why.

Many operating systems and NIC drivers support TCP Segmentation Offload (TSO) aka Large Segment Offload (LSO) aka Generic Segment Offload (GSO). What this means is that the TCP stack sends a chunk of data for the NIC to break up into Maximum Segment Size (MSS) pieces to send on the network. TCP might hand the NIC 16k of data and the NIC will break it into MSS sized bites: 11 segments of 1460 bytes and one segment of the remaining 324 bytes. This offloads the task to the NIC and saves overhead on the host’s resources. It’s a performance thing.

Here’s the kicker: Wireshark uses libpcap or winpcap to grab the data before it gets handed to the NIC.

Check it out:

libpcap-trpy

So you don’t see the actual packets that are put on the wire unless you capture outside the sending host with a tap or span port. This is one of several reasons it’s a good idea to capture traffic outside of the hosts involved in the connection whenever possible.

Here’s what the data looks like captured on the sender and then arriving at the receiver after it has been segmented:

HostA-HostB

This behavior makes TCP sequence number analysis a pain in the ass. If you’re a network troubleshooter using packet analysis, you’ve GOT to be comfortable doing sequence number analysis.

I saw someone post on reddit the other day asking about sequence number interpretation in tcpdump output. The most upvoted comment said that they had been looking at tcpdump output for 15 years and that they had never had to calculate sequence numbers.

WHUT?

I mean, what have you been doing for 15 years, son?

Anyway.

There’s another side to it that I recently saw for the first time. Large Receive Offload (LRO) or Receive Segment Coalescing (RSC). The is the same thing but in reverse. The NIC coalesces TCP segments it receives from a remote host into larger packets before sending them up to the TCP stack. Again, by offloading this to the NIC, it’s a performance enhancement but a pain in my ass.

Check out this capture taken on the client. Notice that this large frame is coming from the server and there’s no way it could have traversed a WAN without fragmentation, so it must be LRO.

RSC

One time, I got annoyed so much at this behavior that I wrote a perl script to break large packets in a capture file into MSS sized packets just to make sequence number analysis easier. I don’t know if anyone is interested in that, but I could post it up if y’all wanted. Of course, if you plan ahead you could just disable segmentation offloading before taking the capture.

So next time you take captures on a host sending and receiving traffic, do not be alarmed if you see Really Big Packets™.

Share this post! Spread the packet gospel!

Facebooktwitterredditlinkedinmail
Follow

About the Author

I like being the hero. Being able to drop a bucket of root cause analysis on a burning network problem has made me a hero (to some people) and it feels real good, y’all. Get good at packet analysis and be the hero too. I also like french fries.

Leave a Comment:

(36) comments

Derek August 18, 2014

Thanks Kary! Another great insight! Oh hey, I recommended packetbomb to a guy on reddit in /r/networking who was looking for some help with a file server performance issue. Hope you don’t mind. Thanks again!

Reply
    Kary August 18, 2014

    Thanks Derek! Yes, I’m always on the look out for interesting case studies.

    Reply
jack August 19, 2014

Nice Kary. New learning for me :)

Reply
Jasper Bongertz August 19, 2014

Nice post, especially pinpointing where the packets are picked up and why that is too soon to be exact.

I wrote a blog post that covered the same topic, at http://blog.packet-foo.com/2014/05/the-drawbacks-of-local-packet-captures/

In my humble opinion captures should never be taken on client or server unless you can live with the drawbacks and are aware of them. So I would not complain about LSO or LRO, CRC errors etc. if doing local captures, because that’s just what happens if it is done that way.

Also, I would never write a script to break up packets into MSS sizes. When the source (local capture) is already “artificial” it can only get worse by assuming things that may not have happened that way on the wire. E.g. you can only guess the timings etc. But again, if you can live with the drawbacks, go ahead :-)

Cheers,
Jasper

P.S.: there are tons of guys out there that think they know all about TCP, but give them one simple sequence to track and they fail every single time.

Reply
    Kary August 19, 2014

    Thanks, Jasper. I dig your site. It’s fantastic.

    For the particular issue I was troubleshooting, breaking it up into separate packets was ok, but you’re right, in general not a very good idea.

    Listen up, people! When I talk about the packet pros, Jasper is one of them. I’ve seen him present at Sharkfest and he knows his stuff. Make sure you subscribe to his site!

    Reply
      Jasper Bongertz August 20, 2014

      Thanks, your site is very cool, too. I’ll keep coming back ;-)

      Reply
    foulsoft October 4, 2019

    Hello Jasper,

    The link you’re providing is useful a well… And probably right… one thing only is with Middleware guys (I’m from), it’ll be useful to capture the TLS “secrets” as the conversation is running… hence be capable to uncrypt SSL/TLS conversation(s)… I don’t even know how to do it without capturing a libcurl-client SSL-TLS keys… and give it to WireShark to decode SSL/TLS traffic. So in my humble opinion, to strictly follow instructions of https://jimshaver.net/2015/02/11/decrypting-tls-browser-traffic-with-wireshark-the-easy-way/ and https://www.youtube.com/watch?v=hh9SRJpK5hI in order to actually beeing able to decode the whole conversation…

    Reply
Mauricio October 22, 2014

Hi Kary,

Nice text! I think that it is possible to disable TCPoffload in the NIC.

Regards,

Mauricio.

Reply
    fuzik May 23, 2016

    Cool post!
    It’s really i was looking for in my trouble.
    And answer to the question for disable it in Linux:
    ethtool -K vlan563 tso off

    To disable it on NIC “vlan563”.

    And if you want to see actual status of the TCP offload for the vlan563:
    ethtool -k vlan563

    Reply
krishna August 31, 2015

Nice article to help clear doubts. It would be great if the pkt dumps mentioned above are attached somewhere so that user can themself see how packets under TSO. One thing not clear to me is when TSO is done, are the original TCP options being copied to all the segments or some changes are done in the same.. I am trying to understand how TSO and MPTCP co-exist (if they).

Thanks again.
Krishna

Reply
    Jason March 28, 2018

    You could confirm this by doing a capture on both the client and a spanned port of the client interface :)

    Reply
Ernest November 21, 2016

Hello Kary

Thanks for posting, very useful information. I have looked at a handful of Wireshark traces now and have seen ‘TCP Segment of reassembled PDU’ by the way what does PDU stand for Physical Data Unit?

So basically are you saying is if this offload behaviour is in action, it is impossible to deduce any thing sensible from the TCP sequence/acknowledgement numbers in he normal fashion? or am I misunderstanding that point?

Yes, I would be very interested in the Perl script please (I will likely turn it into a PowerShell script as I am working on Windows)

One last question please

Lets say I have to capture on a Windows Server (as the Cisco guys will not setup a span port for me. I then turn off, offloading on the Windows NIC, If the host at the other end of the connection (storage appliance for example) has offloading enabled, will I also have issues with Seq/Ack numbers.

Thanks very much
Ernest

Reply
shiyao March 21, 2017

Wondering how you draw this picture: http://packetbomb.com/wp-content/uploads/2014/08/libpcap-trpy.png

What’s the font and what’s the tool used?

Reply
Saqib March 30, 2017

So, supposing that I don’t have access to the network switch. If I use a third machine with a soft switch between the sender and receiver and configure the soft switch to dump all frames to the hard drive, do you think that would be any different? I’m not even sure if the soft switches available today have that feature. Ignoring, of course, if the traffic is such that it can be captured in software without losing frames.

Reply
Marcelo Lima November 26, 2017

Really nice post! Thank you!

Reply
    Kary November 26, 2017

    Thanks Marcelo! Long time no see, hope things are going well

    Reply
K_charo December 13, 2017

This post saved my life!
It’s well written and easy to understand.
Thanks very much!

One point: I can’t believe that the guy whose seeing the capture for 15 years had never had to follow the sequence numbers…
I came here several months ago and even I’m already doing this….

Anyways, thanks a lot!

Reply
Vinicius Ferreira July 24, 2018

Very nice content Kary. I have a question though.

I have captured the traffic from the vyatta, which is the FW in front of my server. I still see the packages there that are greater than the MTU. Is it possible that the vyatta is also doing the reverse LRO as you explained above?

On the dump I am seeing packages with 2764 bytes, although the MTU is set to 1400 on the interface of my server. How is that possible?

Thanks.

Reply
Will July 26, 2018

Nice explanation, but I have one question. Exactly what # within the Wireshark capture represents the MTU size.

I’ve been banging my head try to pinpoint which figure is the MTU.

Thank you so much,

Will

Reply
david.woo December 6, 2018

thanks you for your post
this post gave me a lot of help.

Reply
Sanjeev June 30, 2019

Good one! Thanks.

Reply
Ben October 17, 2020

Hi
I am getting large packets as you described above so as you recommended I changed the setting to the following:

ethtool -k eth0 | grep tcp-segmentation-offload

I am still getting large packets. Am I doing something wrong?

I would love to get the script you mentioned above to see if it will solve the problem. If you could email it to me I’d really appreciate it.

Thank you,
Ben

Reply
    Kary October 17, 2020

    Hey Ben, I don’t think that command is going to turn it off. I think someone posted it in a comment but if not, a quick google will turn it up. Sorry, I’m afk at the moment

    Reply
      ben October 18, 2020

      Thank you,
      Ben

      Reply
Ezequiel October 22, 2020

Thank you! I have an issue related to this behavior

Reply
Yasin December 23, 2020

Hi Kary, thanks to you i’m getting better at packet analysis but except this one (: Hope you will answer my question below.

My question is;
Captured an iso download from web while i study on packet analysis… On wireshark i’m seeing, server sends 2946, 2774, 9698, 13026 bytes packets (headers included)… MTU is 1500 and LRO is disabled on my laptop. Large packets size are varying from 2.9k to 13k of bytes. Don’t fragment flag is set on packets. And packets from server appears in microseconds while 3-way handshake shows 35 ms rtt on wireshark and ping shows average 280+ ms rtt… What possibly cause this stuation? What am i possibly missing?
Thanks in advance…

Reply
Yasin December 23, 2020

Hi again, it seems that “generic-receive-offload” was on… Turned it off and big packets gone. (:

Reply
Riyaz March 23, 2021

What happens to the IP Identifier? Will it be the same for all the packets that are segmented on the NIC level?
What Happens to the Sequence Numbers and Ack numbers ?

Reply
Eric Lafontaine April 26, 2021

Thanks a lot for this article, it clarified something for a client of mine.

Reply
Eric M June 14, 2021

Oh boy. So I’m looking at a packet capture here that was taken off of a vmware server inside a large cloud service provider and WireShark is telling me there are packets up to almost 15,000 bytes, I’m thinking WTH! I come across your explanation and I had forgotten all about this. I don’t even remember the last time I saw this, probably because 99.9999% of our captures are done with TAP’s or SPAN’s. Definitely made some notes in onenote and saved the link! Thanks for the explanation Kary!

Reply
Daren Matthews July 27, 2021

Hi Kary, I hope you are well. I’m an ex-colleague of yours (and we met in SFO). You might remember me. You mentioned that perl script to break up the packets to the MSS size. I’d be really interested in seeing that. Meanwhile, all the best to you and Packetbomb.

Reply
    Kary July 27, 2021

    Hi Darren, yes, of course. I’ll never forget beating my head against a throughput issue and you casually asked if it was just hitting the license bandwidth limit. It never occurred to me to think of that and of course that’s what it was. I’ll have to look and see if I still have that script. I must warn you and anyone else, it’s not something that should be used for anything important or trusted in any way, more of an academic exercise.

    Reply
      Daren Matthews July 28, 2021

      Thanks Kary much appreciated. Great to hear from you. And for your great presentations at SharkFest Wireshark Developer and User Conference.

      Cheers, Daren

      Reply
Yannick December 7, 2021

Hello,

thanks a lot ! We were scratching our heads as to why on the emitting end the packet was >1500 bytes, but chunked in smaller packets at the receiving end when running wireshark on the server and the client.
While IP fragmentation was set to do not fragment, and was not used anyway.
And the server seemed happy to receive ACKs about TCP packets it did not sent ? We felt we were missing something in wireshark.

Used “ethtool -K bond1 tso off” and then I could see on wireshark the actual TCP fragmenting going on as expected on the server (and switched it back on right after)

So thanks again for this great explanation !

Reply
Matt Rome December 16, 2021

Thank you sir!

I am 2/3 to my CCNP cert so I understand this stuff, yet I don’t know the console commands, nor the various strategies to utilize to efficiently parse the data.

Question– What is the best literature to read to master the Wireshark tool as well as its various methods for strategic application?

Thank you!

-Matt

Reply
Add Your Reply

Leave a Comment: