Do the Packet Trace - Especially When Things Are Good

Last night we were troubleshooting really poor performance on a core system. One of the things being investigated was a network switch. On every interface of the network switch we were seeing flow control pause counters continually incrementing. This is usually an indication that the systems hooked to those interfaces are saying, "Whoa, can't handle the traffic, pause a bit, and let me catch up." But the fact we were seeing this on every interface, even for brand new servers that were doing absolutely nothing seem to indicate something was going on at the switch level. We had vendor support on the phone and they were flagging the counters and saying, "Your servers can't keep up. This may be the cause of your performance problems."

Even though I'm a DBA now, because the issue was big enough, I was brought back over to provide any assistance I could yesterday afternoon. I had already done a packet trace. I had seen all the MAC control datagrams saying pause. They looked a lot like this post here. So we immediately said, "Hey, we shouldn't be seeing these. Let's turn off flow control everywhere." We did - on the server and the switch. The flow control messages kept coming. What?!?

What made things increasingly interesting is that when we looked at the MAC address for the origination of these messages (this is at layer 2), they weren't from the same MAC address the OS was recognizing. Same vendor, and the MAC address was only 1 or 2 off on that last octet. So it seemed tied to the same converged network adapter (CNA), but we couldn't explain the MAC address any more than we could explain the flow control message. One of the things I eventually keyed in on after concluding that the servers weren't under load and shouldn't be sending pause messages because they were under load was to look at the time quanta. The time quanta was 0. This means "send immediately." Basically, if I send a message saying pause, I set a time quanta for you to wait. If I send another message saying pause, and the first one isn't done, you overwrite the time with the new. So a 0 time quanta basically means, if you were pausing, stop, and give me the data. And we were seeing a ton of these. I still couldn't explain the cause of the messages, but what it did tell me is the counter incrementing on the switch was a red herring. Our switch vendor did some research in their case histories and found out that it was consistent among at least two of the generation 1 CNAs to do this in order to keep traffic flow coming. Low and behold, that's what we have, gen 1 CNAs. So the adapter was automatically sending out the messages independent of the OS and that explained the different MAC address. So it was not the real problem. And we could safely ignore the flow control pause messages we were seeing. Meaning the track they were taking was a dead end.

Here's why I say to do the packet trace especially when things are good. The networking guys indicating they've been seeing these counters increment all along, ever since we put the switches in. However, no one had collaborated to do the packet traces on these servers to investigate what was going on. On previous support calls the switch vendor had indicated we could ignore these counter increments, they weren't related to whatever issue we were having. Why this time was different, I don't know. However, had we done a packet trace when performance was good, we'd have seen the time quanta zero and the unexplained MAC address then. And we'd have worked on an explanation then. Meaning we wouldn't have traced that trail in the wee hours of this morning because we would have known that was expected traffic. It's always a good idea to look at your system carefully when it's working fine. That helps you see what is out of place when things aren't going so well. And this is especially true at the network layer. I can't tell you how many times I've discovered an issue that was at the server level because of a packet trace. But that's a post for another time.

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

Do the Packet Trace - Especially When Things Are Good

Rate

Share

Share

Rate

Do the Packet Trace - Especially When Things Are Good

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts