Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

SQL Man of Mystery

Wes Brown is a PASS chapter leader and SQL Server MVP. He writes for SQL Server Central and maintains his blog at http://www.sqlserverio.com. Wes is Currently serving as a Senior Lead Consultant at Catapult Systems. Previous experiences include Product Manager for SQL Litespeed by Quest software and consultant to fortune 500 companies. He specializes in high availability, disaster recovery and very large database performance tuning. He is a frequent speaker at local user groups and SQLSaturdays.

Found a Bug in SQL Server 2005

It doesn’t happen often but every once in a while you may be the lucky person to find a previously unknown bug in SQL Server.

It was a normal morning for me, checking the status of our servers going over any failure messages waiting for the day to ramp up. That’s when one of our lead developers came around the corner and told me he had an error when he had tried to create an index on a table he was working on. The more he tried to explain the error the more I started to worry. I had him send me the code and the error statement.

Location:     BtreeMgr.cpp:5372

Expression:   bufferLen > currOffset + ACCESSSOR_OVERHEAD

SPID:         116

Process ID:   5016

Msg 3624, Level 20, State 1, Line 2

A system assertion check has failed. Check the SQL Server error log for details. Typically, an assertion failure is caused by a software bug or data corruption. To check for database corruption, consider running DBCC CHECKDB. If you agreed to send dumps to Microsoft during setup, a mini dump will be sent to Microsoft. An update might be available from Microsoft in the latest Service Pack or in a QFE from Technical Support.

Msg 0, Level 20, State 0, Line 0

A severe error occurred on the current command.  The results, if any, should be discarded.

I had what we like to call in the high availability space a “pucker moment”. This wasn’t your normal, I typed something wrong and got an error, kind of problem. This was a real SEV 20 with an assert, the core engine had just puked on something it shouldn’t have.

Like all good DBA’s the first thing I did was run a DBCC on the database this error was generated from.

While that was going on I asked my very good friend, Google, if he had seen this particular assert before. For the first time in a very long time Google failed me! In the last few years if I hit this kind of hard error someone else has too and it is ether being fixed in a hot fix or addressed in the next version of SQL Server, but not this time.

So, we have this same schema on another server and the developer tried the exact same code there and had the exact same error.

I had him document the steps he took to get to this point and to his credit the steps were clear, concise and easily reproducible.

The DBCC finished with zero problems detected, which let me calm down a bit. That coupled with the fact it looked like I had a repeatable test case When the second database had cleared the DBCC I set about my task of reproducing the error and trying to find a work around. Lucky for us it was a simple matter of column organization in the index and we were able to apply it successfully and carry on with life.

I bundled up the documentation I had accumulated, ran the test case confirmed the bug and sent it off to the powers that be at Microsoft. Since we had a work around and it wasn’t a show stopper I didn’t raise it as a critical server down issue but Microsoft still worked it in a timely fashion.

So, what was the problem you say? It was an interesting edge condition.

We have a table that contains a composite primary key and the rest is made up of bit fields, a flag table.

We had to add a new column, another bit flag, to the table.

The non-clustered covering index was dropped the column was added to the end of the table.

The index was updated with the new column at the end of the column list and then *POOF* it blew up.

I think it has to do with two specific things.

First, bit fields are stored in a compact manor where multiple bits share a byte and aren’t truly separate from every other but field. It would be a huge waste of space to store each bit in it’s own byte but it would make things like this index issue less likely to happen.

Secondly we did a column add but didn’t drop and recreate the table repopulating it in the process so things at the page level weren’t nice and neat. The underlying clustered index wasn’t effected but when we tried to add an index back with the new field it couldn’t do it. The fix was simple, change the column order in the non-clustered index moving the new column up one. We verified the data without the index was correct and with the index was correct.

I haven’t tried it yet, but I am betting included columns won’t suffer the assert ether since the items don’t have to be sorted in the index.

So there you go! Having been on the software side of things a couple of times I always find it interesting when I find bugs in others products and work the issue to conclusion.

What is your take away from all of this? Never be afraid to submit a bug report to Microsoft. I have heard people say to the effect someone else will or has hit the bug and they will submit it. DON’T RELY ON THE ACTIONS OF OTHERS! Reporting bugs helps the community as a whole and makes the product better. When you assume someone else is worked it you are putting YOUR production servers in the hands of strangers. If someone has submitted it and it didn’t turn up in a search they will let you know, and be very kind about it to boot. You will get piece of mind that it is being worked on and it is a bug, or that you may keep someone else from stumbling onto this and not having the knowledge to fix it or work around it.

Wes

Comments

Posted by Tim Mitchell on 1 June 2009

Excellent point.  It's very easy to just move on when you solve the problem, even if it truly is a bug in SQL Server.  However, reporting these issues to the folks that can resolve them on a global level helps everyone.

Posted by Jorge Segarra on 1 June 2009

Wow, always cool to see new bugs and (thankfully) a successful resolution. Did you by chance utilize MS Connect site for this or was this done solely through MS support call? There was an interesting back/forth this weekend on Twitter regarding the use of Connect and publicizing specific Connect issues.

Posted by Wesley Brown on 2 June 2009

I used connect for the first time. Usually, where I work they have always had a select A agreement with 4 hour turn around for pretty much everything but this is a very small shop and I didn't want to incur any cost if it wasn't a bug so connect was the way to go.

Posted by drnetwork on 5 June 2009

I remember my first bug I found. It's always a little bit fun. Mine was where anyone that could get logged into a SQL Server 2000 instance could hang the instance with a SQL string of less than 100 characters. I don't think it's been fixed yet. But then who uses SQL 2000 any more...

Posted by Logan D. Merazzi on 5 June 2009

drnetwork... yes, there are people who still uses SQL2000, believe me... unfortunately :(

But about the post, it's always good to see those problems and, obviously, the solution. ;)

Best regards.

Posted by Jerry Sommerville on 5 June 2009

I appreciate it when someone researches and documents bugs and then tells us about them.  The process you went through helps all of us in our efforts to resolve errors and bugs.  Thanks for your efforts.

Posted by rhughes on 5 June 2009

Quite a few more new ones in SS 2008.. new one after installing SS 2008 sp1 is that OLE DB services for Oracle no longer work in SSIS Import Wizard complaining of support files not found, yet no problem when using same methods via linked server option.. I had posted this a few weeks ago via the MS SQL Server Forum site, since then another batch of hot fixes has been released since sp1 for SS 2008, but fix for that problem not included yet..

Posted by sashikanta.mishra on 5 June 2009

Hey Wesley,

Gr8 effort by you and your Team!!! Hats up to You guys !!

I would really appreciate you for three things...

First : You/Your Team has identified a bug in SQL server    

       2005 which i think the best among other release.

Second : You have found out the a Work-around  for that

        (Usually people put the bugs/issues to resolve by  

         others).

Third  : You just documented, tested , re-tested ,

        regenerated , reproduced and shared.

Kepp it up!!!

Regards!!!

Posted by HLogic on 7 June 2009

Very interesting.  I encountered very similar bug in SQL Server 2000 (SP1, I think).  Bit column as the last column in a non-clustered index...  Unfortunately, that bug did not rear its ugly head during creation but instead, during utilization.  A nightly reorg process would die in midstream with a cryptic 'the end is nigh' error msg.  I wonder if this one may have slipped through the cracks - again.

Leave a Comment

Please register or log in to leave a comment.