DBCC CHECKDB wont run - Interesting Challenge

  • Hi All,

    Very long time since i was last here.

    Need some advice with this challenge, hopefully Gail or Paul might be around?.

    SQL 2005 SP2

    Dedicated NAS - about 8 years old

    3.5TB database split over multiple volumes, largest datafile is 2.3TB.

    operating system is windows 2003

    There have been issues with sql database consistency problems being reported for quite a while, checks on the storage indicate no problem, but I am not convinced. Database is still usable and is a production server, but the server is restarted/rebooted on what used to be weekly, to pretty much every day and recently 3-4 + times a day now.

    got about 200K messages like the following within the last 24 hours, although this has been ongoing for a long time.

    SQL Server detected a logical consistency-based I/O error: torn page (expected signature: 0xaaaaaaaa; actual signature: 0x75cac85e). It occurred during a read of page (6:91296219) in database ID 5 at offset 0x0000ae223b6000 in file 'I:\.ndf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

    SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x784b5a48; actual: 0xe8681f5f). It occurred during a read of page (6:91088683) in database ID 5 at offset 0x0000adbce56000 in file 'I:\.ndf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

    SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 6:91296185; actual 2057:151455495). It occurred during a read of page (6:91296185) in database ID 5 at offset 0x0000ae22372000 in file 'I:df'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

    basically error 824 serverity 24 state 2

    all within the same datafile, and ironically one that has autogrowth switched off.

    msdb..suspect_pages currently has 405 rows with the errorcount for each row varying from 1 to 100k

    when i try to run dbcc checkdb with any options i get

    Msg 1823, Level 16, State 2, Line 1

    A database snapshot cannot be created because it failed to start.

    Msg 5123, Level 16, State 1, Line 1

    CREATE FILE encountered operating system error 1450(error not found) while attempting to open or create the physical file 'H:\.mdf:MSSQL_DBCC11'.

    Msg 7928, Level 16, State 1, Line 1

    The database snapshot for online checks could not be created. Either the reason is given in a previous error or one of the underlying volumes does not support sparse files or alternate streams. Attempting to get exclusive access to run checks offline.

    Msg 5030, Level 16, State 12, Line 1

    The database could not be exclusively locked to perform the operation.

    Msg 7926, Level 16, State 1, Line 1

    Check statement aborted. The database could not be checked as a database snapshot could not be created and the database or table could not be locked. See Books Online for details of when this behavior is expected and what workarounds exist. Also see previous errors for more details.

    Is causing me a few headaches, no backups exist for this database, dont get me started 😛

    database is absolutely critical to the business

    Really need advice from someone who understands DBCC CHECKDB extremely well, as I am not sure if the reason DBCC CHECKDB doesnt run, could be down to insufficent space to create the internal snapshot, if someone could give a rough estimate on how much space is required, for a 3.5 TB database, that would be great and bad in a way, as I cannot add any more storage to this server.

    Seriously hoping i dont need to go down the DBCC PAGE and IND route, with hex editior in hand but have endless supply of caffeine if required. 😉

    Any helpful advice is always appreciated, if you need more info, just let me know.

    Cheers

    Darren

    new update, think i might be in new world of pain

    managed to work with dbcc checktable briefly, still errors out but recommended repair is allow data loss.

    done it with estimateonly with main table, need close to 300GB for tempdb, that drive has 2GB free and cannot be increased.

    --------------------------------------------------------------------------------------
    [highlight]Recommended Articles on How to help us help you and[/highlight]
    [highlight]solve commonly asked questions[/highlight]

    Forum Etiquette: How to post data/code on a forum to get the best help by Jeff Moden[/url]
    Managing Transaction Logs by Gail Shaw[/url]
    How to post Performance problems by Gail Shaw[/url]
    Help, my database is corrupt. Now what? by Gail Shaw[/url]

  • DBCC CHECKDB creates a database snapshot by its own. In your case this is failing.

    Are you able to create a snapshot yourself? If so, you can run the DBCC CHECKDB on that snapshot. If not, maybe the error messages give you more information...

    ** Don't mistake the ‘stupidity of the crowd’ for the ‘wisdom of the group’! **
  • okay nice idea in theory, but space is limited and database snapshots are only supported in enterprise edition, the edition, this is on, is workgroup edition.

    But thanks for the reply.

    --------------------------------------------------------------------------------------
    [highlight]Recommended Articles on How to help us help you and[/highlight]
    [highlight]solve commonly asked questions[/highlight]

    Forum Etiquette: How to post data/code on a forum to get the best help by Jeff Moden[/url]
    Managing Transaction Logs by Gail Shaw[/url]
    How to post Performance problems by Gail Shaw[/url]
    Help, my database is corrupt. Now what? by Gail Shaw[/url]

  • Are you editing out the filenames in those messages, or is SQL really reporting extensions without names?

    Operating system error 1450: http://blogs.msdn.com/b/psssql/archive/2008/07/10/sql-server-reports-operating-system-error-1450-or-1452-or-665-retries.aspx, so you may not be able to use a snapshot at all, if the data changes are exceeding what's allowed for a snapshot

    Can you make enough downtime to run an 'offline' checkDB?

    If so, take the DB into single user mode and run the following (exactly the following please)

    DBCC CheckDB ('<database name>') WITH NO_INFOMSGS, ALL_ERRORMSGS, TABLOCK

    Mission critical database, no backups, long term corruption problems? :crazy:

    Repeated corruption indicates a problem with the IO subsystem. Doesn't necessarily mean the disks, could be anything in the IO stack from filter drivers right the way down.

    If that was my DB, I'd be looking to get it onto alternate storage ASAP (even if that means performance degradation) and getting a backup taken by any means necessary. As it is, you stand a fair chance of losing that DB entirely. Seen it happen.

    Fixing the corruption is not your primary goal here, getting the DB onto stable storage is. Otherwise you could repair and be back in this spot tomorrow.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks for the quick reply Gail.

    Manually removed the filenames, confidential system.

    Server is basically being rebooted on a 3 hourly basis atm.

    Just quickly took opportunity to run the dbcc command you suggested, it ran for a minute without any problems, so will have to engage business today and see if we can schedule downtime tonight perhaps.

    however doesnt using tablock, switch to using tempdb for the work that is carried out.

    only reason i ask that, is if i do a dbcc check table with the tablock and estimateonly, for the biggest table in the database which has about 880 million rows, it shows 300GB estimate for TempDB.

    unfortunately tempdb is on drive with only 2GB free. there is no way to increase storage, unless i make the drastic step and move tempdb on to one of the other volumes which has 300GB+ free capacity

    server has dedicated NAS and therefore is not on normal SAN, Netbackup is how the databases are backed up. Netbackup cannot be used apparently so the only way to backup the database is to P2V and they can change the storage but if they do that, only one copy can be active as the server name will be same. so complicated.

    But i totally agree with what you are saying, does the suspect_pages table in msdb, indicate data corruption, or are they purely read attempts that failed and have got the pages marked as suspect.

    --------------------------------------------------------------------------------------
    [highlight]Recommended Articles on How to help us help you and[/highlight]
    [highlight]solve commonly asked questions[/highlight]

    Forum Etiquette: How to post data/code on a forum to get the best help by Jeff Moden[/url]
    Managing Transaction Logs by Gail Shaw[/url]
    How to post Performance problems by Gail Shaw[/url]
    Help, my database is corrupt. Now what? by Gail Shaw[/url]

  • dont mean to double post.

    just as an update, because the server is rebooted every hour or so atm, cannot do a netbackup as that will take about 20 hours, a sql native backup would take roughtly 10 hours and dbcc checkdb wouldnt be able to finish within an hour, so p2v is looking like the only viable solution atm.

    Thanks for your help Gail anyways

    --------------------------------------------------------------------------------------
    [highlight]Recommended Articles on How to help us help you and[/highlight]
    [highlight]solve commonly asked questions[/highlight]

    Forum Etiquette: How to post data/code on a forum to get the best help by Jeff Moden[/url]
    Managing Transaction Logs by Gail Shaw[/url]
    How to post Performance problems by Gail Shaw[/url]
    Help, my database is corrupt. Now what? by Gail Shaw[/url]

  • Silverfox (3/11/2014)


    however doesnt using tablock, switch to using tempdb for the work that is carried out.

    No. It switches to using table locks for stability rather than a database snapshot.

    But i totally agree with what you are saying, does the suspect_pages table in msdb, indicate data corruption, or are they purely read attempts that failed and have got the pages marked as suspect.

    In this case, they're error 824, meaning the read request succeeded but the page returned to SQL by the OS was invalid. Failed checksum or other damage. So either the page has been altered on disk after SQL wrote it out, or the read operations are mangling the data in flight.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply