IAM Page Out of Range

  • This is a nasty one.

    One of our test servers coughed up variations of the following hairball in droves:

    Table error: IAM page (1:940) (object ID 555253133, index ID 0) is out of the range of this database.

    The database itself became corrupt. Fortunately, I have a good backup that checked out well on its DBCC CHECKDB that was run shortly before the symptoms began to appear. However, I'm anxious to locate the cause of this in order to ensure it does not proliferate, and at the moment all I have are vague indications, not hard truth, which is why I seek the feedback of the good folks here at SQLServerCentral.

    I went to Google and found the following article:

    http://support.microsoft.com/default.aspx?scid=http://support.microsoft.com:80/support/kb/articles/Q278/3/66.asp&NoWebContent=1

    The symptoms all seem to apply. It is possible that concurrent DBCC SHRINKDB executions were (mistakenly) running against the problem database at the same time. The article seems also to implicate heaps in the matter, and yes, there are a few heaps, also some tables without clustered indexes.

    However, the article claims this was a problem with SQL Server 7, prior to SP3. We are running SQL Server 2000, SP3. Either we have a different problem, or else Microsoft needs to fess up.

    I attempted a DBCC CHECKDB, ultimately with the "fix it -- do whatever you have to do" option (repair & allow data loss). No go. There may be a way to fix such issues, but I don't know what that would be. The error messages were vast and menacing, but the basic answer was, "No."

    Any ideas? Thanks in advance!

    Edited by - Lee Dise on 10/01/2003 08:40:59 AM

  • quote:


    Any ideas? Thanks in advance!


    Call Microsoft PSS, but be prepared to prove to them that it's not a hardware issue...

    --Jonathan



    --Jonathan

  • I agree. PSS can help, but will likely point to hardware as the problem. These kinds of things can be very, very difficult to track down and could easily be a hardware glitch in writing the data under load.

    Steve Jones

    sjones@sqlservercentral.com

    http://www.sqlservercentral.com/columnists/sjones

    The Best of SQL Server Central.com 2002 - http://www.sqlservercentral.com/bestof/

    http://www.dkranch.net

  • Thanks for the insight -- I usually consider hardware last, and that's not necessarily the right place for such considerations.

    Right now, I'm searching the web and SQL Books for any in-depth information on database repairing. I don't see a whole lot. Frankly, it looks like there is little you can do once a database becomes corrupt.

    "Repairing and allowing data loss" strikes me as comparable to fixing a pimple on the cheek with a guillotine. All that seems to matter at that point is whether you have a backup.

    I have the old database backup file from prior to glitch, and I'm about to restore it. I wonder what's going to happen. Is it possible for a restored database that registers clean on the DBCC CHECKDB chart to contain incipient corruption that will shortly manifest itself?

    Arthur C. Clarke once mused that any technology sufficiently advanced beyond one's level of understanding is, effectively, magic. I find myself now operating at about that level.

  • What indexes are on this table and does it have any text/ntext/image fields?

    Keith Henry

    DBA/Developer/BI Manager




    Keith Henry



    According to everyone I know I "do something with computers?" for a living, so there you go.

  • quote:


    Is it possible for a restored database that registers clean on the DBCC CHECKDB chart to contain incipient corruption that will shortly manifest itself?


    Everything is possible but IMHO the probability is rather low. As far as my experience goes, we had no problems persisting if the CHECKDB finished with clean result.

  • keithh: There are probably 270 tables in the database, a substantial number of which are heaps (unfortunately) and also many that have indexes but lack clustered indexes. There are some TEXT columns as well, unfortunately, but they are not heavily stressed -- a VARCHAR declaration would have been a better choice for all of them.

    Vladan: Thanks for your insight on CHECKDB. For grins, I ran CHECKDB this morning on the restored "good" copy and it checks out clean. I'm running a SHRINKDB on it as we speak to see if I can provoke it into misbehaving again.

    BTW: Once of these days I plan on visiting the Czech Republic. In a former life, I was a music student and my thesis topic was on the music of Leos Janacek. He and Smetana are two of my favorites.

    Update: I was able to duplicate the corruption! After restoring the 'good' copy and verifying it was good with CHECKDB, I ran a procedure that re-runs CHECKDB, runs update stats, runs DBREINDEX, and then finally SHRINKDB.

    The fact that I was able to duplicate the issue on the same database -- after DBCC gives it a clean bill of health -- would seem to me to implicate a bug in SQL Server, unless someone can talk me out of it. It could be hardware, but what are the odds that the same glitch would recur on the same database?

    The aforementioned Knowledge Base article speaks ill of concurrent runs of SHRINKDB, but in this case, there was only one SHRINKDB execution. Odd!

    Edited by - Lee Dise on 10/02/2003 07:37:30 AM

  • I get this error occasionally also. Same arrangement, SQL 2000 SP3. Usually this indicates that one or more indexes on the rogue table are corrupt. By running a dbcc dbreindex, it takes care of the problem. This might be treating the symptoms instead of fixing the problem. I have tried to trace this one for a long time with no real luck.

  • May I ask what raid controller is installed. We have similare problems on one server. It has adaptec, and I don't know how to check if write caching is off.

  • quote:


    BradleyB: By running a dbcc dbreindex, it takes care of the problem. This might be treating the symptoms instead of fixing the problem. I have tried to trace this one for a long time with no real luck.


    I can duplicate the error and BradleyB is correct by saying it can be prevented by running DBCC DBREINDEX. More importantly, I can duplicate the error on a different server, so that would seem to be evidence against the proposition that it is a hardware issue.

    With nothing up my sleeves, here are the steps I followed to duplicate the error:

    1. Restore the offending database.

    2. Run DBCC CHECKDB against the database; it checks out clean.

    3. Run DBCC SHRINKDATABASE against the database. It runs to completion and tosses no chunks.

    4. Run DBCC CHECKDB against the database again. This time, the chunks start splattering immediately. Here is a very small snippet, for example:

    quote:


    Server: Msg 7995, Level 16, State 1, Line 1

    Database '<name>' consistency errors in sysobjects, sysindexes, syscolumns, or systypes prevent further CHECKDB processing.

    CHECKDB found 2 allocation errors and 0 consistency errors in table 'sysindexes' (object ID 2).

    Server: Msg 8968, Level 16, State 10, Line 1

    Table error: IAM page (1:41415) (object ID 3, index ID 0) is out of the range of this database.

    Server: Msg 8968, Level 16, State 1, Line 1

    Table error: IAM page (1:64147) (object ID 3, index ID 2) is out of the range of this database.

    Server: Msg 8968, Level 16, State 1, Line 1

    Table error: IAM page (1:169) (object ID 6, index ID 0) is out of the range of this database.

    Server: Msg 8968, Level 16, State 1, Line 1

    Table error: IAM page (1:6256) (object ID 9, index ID 0) is out of the range of this database.

    Server: Msg 8968, Level 16, State 1, Line 1

    Table error: IAM page (1:41395) (object ID 12, index ID 0) is out of the range of this database.

    Server: Msg 8968, Level 16, State 1, Line 1

    Table error: IAM page (1:104115) (object ID 12, index ID 2) is out of the range of this database.

    Server: Msg 8968, Level 16, State 1, Line 1

    Table error: IAM page (1:475975) (object ID 6291082, index ID 17) is out of the range of this database.

    Server: Msg 8978, Level 16, State 1, Line 1

    Table error: Object ID 6291082, index ID 18. Page (1:293246) is missing a reference from previous page (1:293247). Possible chain linkage problem.


    And so on, and so forth. If every table is reindexed prior to shrinking, nothing bad like this happens.

    So I have a work-around, but I just don't know what to do with all this information. Microsoft ought to be interested in what I've found, I suppose, but I don't think I can just zip up the offending database and mail it to them (we have sensitive data in it). Any suggestions?

    And thanks for all the valuable input!

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply