Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

Zeroing out bad block? Expand / Collapse
Author
Message
Posted Thursday, May 31, 2012 2:18 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, July 11, 2014 9:28 AM
Points: 47, Visits: 239
I have a 2005, SP4, 64 bit, Enterprise edition server. Our Netapp crashed suddenly and when it was brought back up DBCC CHECKDB found a corruption in a table. After dropping an index and rebuilding it the CRC errors stopped. Problem solved, except that BACKUPS now fail for this database (with the same CRC errors) when the process gets to the bad block in the data file. Netapp is proposing that they zero out that block (overwrite the block with all zeros) to correct this.

Will this work and will it affect the database in any way? My initial thought is that the database should be fine as this block is not currently in use by any objects, but my boss is afraid of further corrupting the data file.

Your thoughts?
Post #1309349
Posted Thursday, May 31, 2012 2:28 PM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 7:54 AM
Points: 42,768, Visits: 35,867
If the block belongs to an allocated page, that will cause further corruption.

What's the error on the backups and what does the following return?
DBCC CHECKDB (<Database Name>) WITH NO_INFOMSGS, ALL_ERRORMSGS




Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1309352
Posted Thursday, May 31, 2012 2:28 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 5:23 AM
Points: 12,904, Visits: 31,979
bad idea. bad bad.

first, don't do anything like take the database offline or anything without expert advice; Gail Shaw monitors this forum frequently..
run DBCC CHECKDB('db name') WITH NO_INFOMSGS, ALL_ERRORMSGS and post the results here.

with that, we can tell you a lot more.

you might be able to do anything from page level restore to get the data back, maybe migrate the data out to a new table to fix it, and many other possibilities.there's just too many possibilities without .

letting ANYTHING other than SQL server touch the bits and bytes of the data will guarantee corruption and loss of data.


Lowell

--There is no spoon, and there's no default ORDER BY in sql server either.
Actually, Common Sense is so rare, it should be considered a Superpower. --my son
Post #1309353
Posted Thursday, May 31, 2012 2:54 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, July 11, 2014 9:28 AM
Points: 47, Visits: 239
Here's the error from the Backup:
BackupIoRequest::WaitForIoCompletion: read failure on backup device 'T:\Data\T2KAdmin.mdf'. Operating system error 23(Data error (cyclic redundancy check).).

We're going to be running a DBCC CHECKDB but it takes 8 hours to complete (305 GB DB).

We had another idea: since this page is now not used by any object since rebuilding the index, if we shrink the data file that disk block could be reallocated back to the operating system and not be in the data file at all any longer. Then our backups should work, right?
Post #1309367
Posted Thursday, May 31, 2012 3:38 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Today @ 8:12 AM
Points: 33,166, Visits: 15,300
It depends on where the block is. If you have this:


data data data data data bad block (free space) (free space)

you could be fine. If you have this:

data data (bad block) (free space) (data) (data) (free space)

you might end up sending another object to the bad block. SQL Server isn't intelligent about the shrink. It reduces the size of the file, moving extents from the end to the beginning.

Ideally, I'd think about copying the mdf file over, but I'd be worried about shutting down the server right now. At the least, I'd get dbcc to finish, and I'd also look at bcp'ing out all data, and I mean all data into files. Drop each table into a separate file and check them for integrity. Worst case, you have to BCP back in, but that's easily scriptable.

I'd also script out all schema. Something like Red Gate's SQL Compare makes this really easy and you could transfer stuff to a shell db if you have the tool.

disclosure: I work for Red Gate.







Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1309392
Posted Thursday, May 31, 2012 3:38 PM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 7:54 AM
Points: 42,768, Visits: 35,867
Maybe, but I wouldn't want to take any chances with this. You don't know for sure that the page is unallocated. If it wasn't allocated, the backup shouldn't be breaking as backups only read allocated extents and if the only corruption was in that index, dropping the index would have deallocated the extent.

p.s. CheckDB Taking 8 hours on a 300GB database? That sounds like a serious IO contention problem in and of itself. I've done checkDB of TB sized databases faster than that.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1309394
Posted Thursday, May 31, 2012 4:05 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, July 11, 2014 9:28 AM
Points: 47, Visits: 239
GilaMonster (5/31/2012)
CheckDB Taking 8 hours on a 300GB database? That sounds like a serious IO contention problem in and of itself. I've done checkDB of TB sized databases faster than that.


I suspect that there are still Netapp issues to be resolved. Since the storage crash and recovery the server has been running at approximately 75% CPU while before the crash it was the normal 20% - 40%.
Post #1309411
Posted Thursday, May 31, 2012 4:09 PM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 7:54 AM
Points: 42,768, Visits: 35,867
May I suggest alternate storage if at all possible...


Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1309416
Posted Thursday, May 31, 2012 4:30 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, August 13, 2014 3:41 PM
Points: 100, Visits: 402
Are you using Snap Manager for SQL? If so, I'd suggest trying to do a native backup to see if it's NetApp that's misinformed regarding the state of that block (it did crash, after all) or whether there's truly something wrong from SQL's perspective. If the native backup works, try restoring it somewhere else (same instance would be fine, but I'd suggest a different LUN, on different drives if possible). If *that* works, I would try for a cutover to that new db (after restoring again and applying logs as necessary).

Good luck. I've dealt with NetApp and SQL Server and will say that it has a bit of a Pinocchio syndrome: when it's good, it's very good. But when it's bad...
Post #1309427
Posted Friday, June 1, 2012 6:59 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, July 11, 2014 9:28 AM
Points: 47, Visits: 239
Unfortunately is the native SQL back up that is failing.
Post #1309699
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse