Leave a Comment:
(5) comments
My best analysis story:
An application critical to my company exhibited performance problems, was falling on its face in customer deployments. It was a stock pricing application used at the head of ticker plants in financial companies all around the world. If you had a 401(k) around 2000, it probably depended on this application. I did analysis of the sort you’ve been describing, specifically TCP behavior. I pinpointed the problem as being in the OS vendor’s implementation of TCP. The buggy behavior was that whenever the sending stack went into congestion control, it never recovered. This resulted in a comically small send window, sometimes just a few multiples of MSS.
It took a while battling with account managers and developer support folks at the OS vendor who didn’t understand the problem, my explanation, or that the issue *couldn’t* be in the application because the application is blissfully ignorant of TCP machinations. It was like talking to a wall. I started at square one with every conference call. Eventually I got on the phone with a guy with whom I could have a good discussion. It turns out that he put the RFC1323 extensions into the stack! The next day I had a patch to the OS in my hands and the product worked perfectly from that point forward.
The developer explained that there was a bug which caused incoming ACKs *with payloads* to be miscategorized as DUPACKs when the stack was in congestion control.
This would never happen with half-duplex applications like HTTP, but the application I was supporting sent data bidirectionally on the socket at all times.
I didn’t have a ton of support from management at the time (my manager even yelled at me for “always wanting to use a sniffer” to fix problems), and nobody but me was looking at the OS vendor’s TCP implementation as the source of the problem. Wrestling the fix from the OS vendor by myself made this victory particularly sweet, earned me a ton of capital to do my own thing, and led to the most interesting problems showing up on my desk.
FWIW, I think about these kinds of things as *protocol* analysis. Yeah, there’s packets here, but it’s really the protocols that matter, not so much the individual packets. If you come to recognize the bunch of individual packet events involved in TCP flows or PIM-sparse setups as the beautiful ballet that they are, then you know that you’re on the right track. And then *also* watch that girl on the obstacle course. Amazing.
ReplyWow, great story, Chris!
> Wrestling the fix from the OS vendor by myself made this victory particularly sweet, earned me a ton of capital to do my own thing, and led to the most interesting problems showing up on my desk.
THIS IS WHAT I’M TALKING ABOUT, PEOPLE. You can’t get this kind of win without packet level analysis. And more capital for yourself makes you more valuable. Companies will usually pay to keep their valuable employees and if they won’t, someone else will.
You’re right; what I’m talking about is packet level analysis of protocols at all layers of the stack. I think people understand what I’m getting at but maybe I should rethink my terminology.
ReplyGreat posts here Kary.
Where are you located?
Are you a Packet Analyst with an Organization or are you an Indy?
Peter
ReplyOld post, but what the hey. Personally I find it absolutely baffling the number of IT “network engineers” I have worked around that don’t use packet capture to troubleshoot. Most seem to just go trial and error and hope they luck upon the correct solution (or google hell out of it and hope they find right answer in some rando blog). Honestly I’m not very good at packet capture analysis but I try. I’ve resolved quite a few problems quickly using Wireshark. It may not solve issue, but can be a quick and fast first look to at least get you direction. I’ve watched people fiddle for hours and hours with routers, switches, firewall rules, IPS systems, etc. when the app/device they are troubleshooting doesn’t seem to be talking to the other end. Why they won’t take 5 min and simply capture to see if packets making it I’ll never know. Takes too much time to learn? Boring? Too complicated? Too much pain in ass? I honestly don’t know.
This ends my rant for the night.