Disappearing DBCC Errors

  • Hello All,

    Environment:

    Windows 2000 sp4

    SQL Server 2000 sp4 build 2187

    EMC SAN attached storage.

    We cannot move off this platform until the third-party vendor certifies their application. 🙁

    Receiving the following error when DBCC run prior to the backing up the database. Once the error is generated maintenance halts. We would then manually run DBCC’s with the ALL_ERRORMSGS option and the error disappears.

    Problem is occurring once a week.

    Different tables each occurrence.

    Same database – more than one database on the server sharing the same physical disks no problems with them.

    We have asked the hardware team to run diagnostics and check all the firmware and drivers. They are very reluctant because there are no errors found in the event logs and HP integrated log viewer.

    We have asked the SAN team to look for any issues, they replied with no errors found, i.e. HBA, power path logs, fabric, etc…

    Any guidance would be appreciated. I also plan on running SQLIOSim to hammer the server.

    Msg 2533, Sev 16: Table error: Page (1:303495) allocated to object ID 2105058535, index ID 0 was not seen. Page may be invalid or have incorrect object ID information in its header. [SQLSTATE 42000]

    Msg 2533, Sev 16: Table error: Page (1:303512) allocated to object ID 2105058535, index ID 0 was not seen. Page may be invalid or have incorrect object ID information in its header. [SQLSTATE 42000]

    Msg 0, Sev 16: DBCC For Database: QSDB Started at: Apr 1 2009 7:52AM [SQLSTATE 01000]

    Msg 2533, Sev 16: Table error: Page (1:303513) allocated to object ID 2105058535, index ID 0 was not seen. Page may be invalid or have incorrect object ID information in its header. [SQLSTATE 42000]

    Msg 2533, Sev 16: Table error: Page (1:303514) allocated to object ID 2105058535, index ID 0 was not seen. Page may be invalid or have incorrect object ID information in its header. [SQLSTATE 42000]

    Msg 2533, Sev 16: Table error: Page (1:303515) allocated to object ID 2105058535, index ID 0 was not seen. Page may be invalid or have incorrect object ID information in its header. [SQLSTATE 42000]

    Msg 2533, Sev 16: Table error: Page (1:303516) allocated to object ID 2105058535, index ID 0 was not seen. Page may be invalid or have incorrect object ID information in its header. [SQLSTATE 42000]

    Msg 2533, Sev 16: Table error: Page (1:303517) allocated to object ID 2105058535, index ID 0 was not seen. Page may be invalid or have incorrect object ID information in its header. [SQLSTATE 42000]

    Msg 2533, Sev 16: Table error: Page (1:303518) allocated to object ID 2105058535, index ID 0 was not seen. Page may be invalid or have incorrect object ID information in its header. [SQLSTATE 42000]

    Msg 8990, Sev 16: CHECKDB found 0 allocation errors and 10 consistency errors in table 'GroupMembersTemp' (object ID 2105058535). [SQLSTATE 01000]

    Msg 8989, Sev 16: CHECKDB found 0 allocation errors and 10 consistency errors in database 'QSDB'. [SQLSTATE 01000]

    Msg 8958, Sev 16: repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (QSDB ). [SQLSTATE 01000]

    Thanks in Advance,

    Paul Randal hope your out there. 😀

  • Have you got a clean backup?

    Page (1:303495) , Page (1:303512),Page (1:303513),Page (1:303514),Page (1:303515),Page (1:303516),Page (1:303517),Page (1:303518)

    You have got corruption on the following pages. You can do a page level restore to restore the following pages if it was SQL 2005.

    Check out Paul's on the same:

    http://blogs.msdn.com/sqlserverstorageengine/archive/2007/01/18/fixing-damaged-pages-using-page-restore-or-manual-inserts.aspx

    Where he beautifully explains to resotre the backup with a different name and find the range of rows that are corrupted.

    You would have to run the repair_allow_data_loss, which will lose from data but dont have to worry as you can copy the same pages over from the different database restore that you have just made.

  • I appreciate your response. When rerunning DBCC CHECKDB the output displays no corruption.

  • Alfredo Giotti (4/1/2009)


    I appreciate your response. When rerunning DBCC CHECKDB the output displays no corruption.

    Alright! Did you fix that? Or have you made a restore from a clean backup? Could you give more insight into this please?

  • In my original post I mentioned that once the nightly maintenance runs whereby issuing a DBCC CHECKDB, prior to backing up the database, the error is generated. Soon after we run DBCC CHECKDB to determine the next course of action however, the errors are gone. With that said, no I have not fixed anything nor have I restored the database. This problem will soon resurface in a few days and the same scenario would occur.

  • It sounds like you have intermittent IO errors. Are you seeing any errors in the SQL error log or the windows error log?

    SAN or direct attached storage? Raid level?

    If I was in your position, I'd be planning to move the DB to different drives before the errors become permanent.

    Also, what version of SQL 2000? (select @@version)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hello Gail,

    The server environment is listed in my original post above.

    The SAN configuration I am trying to obtain. I have inherited this server.

    There are no errors in the SQL Server logs, Event logs, or HP (Compaq Proliant) logs.

    If you take a look at my original post you will notice the battle I am facing with our internal hardware and storage teams. The issue I am facing is that we, the DBA team, must always prove that it is not SQL Server.

    In any case, I really appreciate your response and everyone else's.

    I am going to plan to execute SQLIOSim to help determine the cause.

  • Corruption 90% of the time is a hardware issue.

    http://www.sqlskills.com/BLOGS/PAUL/post/How-to-tell-if-the-IO-subsystem-is-causing-corruptions.aspx

    http://www.sqlskills.com/BLOGS/PAUL/post/Search-Engine-QA-26-Myths-around-causing-corruption.aspx

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Your corruptions are 'disappearing' because the rest of your nightly maintenance is doing stuff like rebuilding indexes (i'd guess) that are deallocating the corrupt pages, and moving the indexes to new pages. This gives the effect of the corruptions disappearing. See the first entry in my SQL Q&A column for the April TechNet Magazine - TechNet Magazine: April 2009 SQL Q&A column.

    Regardless of whether the errors are disappearing or not, you've got IO subsystem problems that are causing this. See the previously posted links for more details.

    Thanks

    Paul Randal
    CEO, SQLskills.com: Check out SQLskills online training!
    Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
    SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
    Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

  • Hi Gail and Paul,

    Thanks for the great feedback.

    The IT hardware team finally reviewed the server found that the array controller has outdated drivers, > 5years old. They are going to update these drivers and at that point I will run some tests.

    I will keep you all posted on the outcome.

    P.S. My index maintenance job run prior to the DBCC's and full backup.

  • Hi Gail and Paul,

    Just wanted to follow-up. Our internal hardware and storage team’s insists there are no issues with any of the drivers and hardware between the server and the SAN disks, even though they tested nothing. This has been my frustration over the years working for this company. Therefore, ever attempt I make to prove to them that it is not SQL Server, fails. I have run SQLIOSim, however, it fails to execute with VIRTUAL PROTECT errors. It is my understanding that SQLIOSim has some bugs running on Windows 2000. I even extracted the version of SQLIOSim, which is installed in the SQL 2008 bin folder, C:\Program Files\Microsoft SQL Server\MSSQL10.INST1\MSSQL\Binn , and gave that a try, no luck, same VP errors.

    Any assistance is greatly appreciated.

    Regards

    Fred

  • Hello Gail and Paul,

    I am some more information detail regarding this problem.

    --------------------------------------------------------------------------

    Recap:

    Error message during DBCC CHECKDB:

    Msg 2533, Sev 16: Table error: Page (1:717080) allocated to object ID 181575685, index ID 0 was not seen.

    Page may be invalid or have incorrect object ID information in its header. [SQLSTATE 42000]

    Msg 0, Sev 16: DBCC For Database: QSDB Started at: Jun 12 2009 7:57AM [SQLSTATE 01000]

    Run DBCC Checkdb manually no issues.

    Very important note: There is a application process hitting this database and particular table at the same time DBCC is being executed every time DBCC throws these messages. I am still trying to gather details about this process. However, what I do know it is performing inserts.

    --------------------------------------------------------------------------

    Based on the output from DBCC I decided to check whether the page, which DBCC complained about, is actually missing, it was not and actually contains data. I am able to query the data on the page without issues.

    NOTE: there were 100 errors related to the same object and all pages were available to query against.

    The DBCC output also mentions index ID 0, there is no clustered index on this table. With all that said, here is my question: Is it possible, because there are no clustered indexes, that DBCC is reading the PFS page which identifies the missing pages as being allocated, however, when DBCC actually needs to read in the page in cannot be found because this application process is in a transaction and as not committed? And by the time we manually run DBCC against this particular table the process has ended and committed is the reason why we do not encounter the corruption?

    NOTE: We a case open with Microsoft regarding this issue, however, I have not conveyed this information to them. But will do so shortly.

    The corruption errors continue to surface even after all the drivers have been updated

  • Nope - that can't happen with DBCC in 2000 and beyond. It's stale read issue on your I/O subsystem, or something similar, or you're doing something and by the time DBCC runs, the corrupt pages have been deallocated. I don't know how else to say this - it's not DBCC, it's your hardware.

    Thanks

    Paul Randal
    CEO, SQLskills.com: Check out SQLskills online training!
    Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
    SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
    Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply