|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Yesterday @ 6:43 AM
Points: 42,
Visits: 212
|
|
I have a 2005, SP4, 64 bit, Enterprise edition server. Our Netapp crashed suddenly and when it was brought back up DBCC CHECKDB found a corruption in a table. After dropping an index and rebuilding it the CRC errors stopped. Problem solved, except that BACKUPS now fail for this database (with the same CRC errors) when the process gets to the bad block in the data file. Netapp is proposing that they zero out that block (overwrite the block with all zeros) to correct this.
Will this work and will it affect the database in any way? My initial thought is that the database should be fine as this block is not currently in use by any objects, but my boss is afraid of further corrupting the data file.
Your thoughts?
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 12:25 AM
Points: 37,734,
Visits: 29,999
|
|
If the block belongs to an allocated page, that will cause further corruption.
What's the error on the backups and what does the following return?
DBCC CHECKDB (<Database Name>) WITH NO_INFOMSGS, ALL_ERRORMSGS
Gail Shaw Microsoft Certified Master: SQL Server 2008, MVP SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
We walk in the dark places no others will enter We stand on the bridge and no one may pass
|
|
|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Yesterday @ 8:29 PM
Points: 11,645,
Visits: 27,738
|
|
bad idea. bad bad.
first, don't do anything like take the database offline or anything without expert advice; Gail Shaw monitors this forum frequently.. run DBCC CHECKDB('db name') WITH NO_INFOMSGS, ALL_ERRORMSGS and post the results here.
with that, we can tell you a lot more.
you might be able to do anything from page level restore to get the data back, maybe migrate the data out to a new table to fix it, and many other possibilities.there's just too many possibilities without .
letting ANYTHING other than SQL server touch the bits and bytes of the data will guarantee corruption and loss of data.
Lowell
--There is no spoon, and there's no default ORDER BY in sql server either. Actually, Common Sense is so rare, it should be considered a Superpower. --my son
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Yesterday @ 6:43 AM
Points: 42,
Visits: 212
|
|
Here's the error from the Backup: BackupIoRequest::WaitForIoCompletion: read failure on backup device 'T:\Data\T2KAdmin.mdf'. Operating system error 23(Data error (cyclic redundancy check).).
We're going to be running a DBCC CHECKDB but it takes 8 hours to complete (305 GB DB).
We had another idea: since this page is now not used by any object since rebuilding the index, if we shrink the data file that disk block could be reallocated back to the operating system and not be in the data file at all any longer. Then our backups should work, right?
|
|
|
|
|
SSC-Dedicated
           
Group: Administrators
Last Login: Yesterday @ 4:46 PM
Points: 31,433,
Visits: 13,745
|
|
It depends on where the block is. If you have this:
data data data data data bad block (free space) (free space)
you could be fine. If you have this:
data data (bad block) (free space) (data) (data) (free space)
you might end up sending another object to the bad block. SQL Server isn't intelligent about the shrink. It reduces the size of the file, moving extents from the end to the beginning.
Ideally, I'd think about copying the mdf file over, but I'd be worried about shutting down the server right now. At the least, I'd get dbcc to finish, and I'd also look at bcp'ing out all data, and I mean all data into files. Drop each table into a separate file and check them for integrity. Worst case, you have to BCP back in, but that's easily scriptable.
I'd also script out all schema. Something like Red Gate's SQL Compare makes this really easy and you could transfer stuff to a shell db if you have the tool.
disclosure: I work for Red Gate.
Follow me on Twitter: @way0utwest
 Forum Etiquette: How to post data/code on a forum to get the best help
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 12:25 AM
Points: 37,734,
Visits: 29,999
|
|
Maybe, but I wouldn't want to take any chances with this. You don't know for sure that the page is unallocated. If it wasn't allocated, the backup shouldn't be breaking as backups only read allocated extents and if the only corruption was in that index, dropping the index would have deallocated the extent.
p.s. CheckDB Taking 8 hours on a 300GB database? That sounds like a serious IO contention problem in and of itself. I've done checkDB of TB sized databases faster than that.
Gail Shaw Microsoft Certified Master: SQL Server 2008, MVP SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
We walk in the dark places no others will enter We stand on the bridge and no one may pass
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Yesterday @ 6:43 AM
Points: 42,
Visits: 212
|
|
GilaMonster (5/31/2012) CheckDB Taking 8 hours on a 300GB database? That sounds like a serious IO contention problem in and of itself. I've done checkDB of TB sized databases faster than that.
I suspect that there are still Netapp issues to be resolved. Since the storage crash and recovery the server has been running at approximately 75% CPU while before the crash it was the normal 20% - 40%.
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 12:25 AM
Points: 37,734,
Visits: 29,999
|
|
May I suggest alternate storage if at all possible...
Gail Shaw Microsoft Certified Master: SQL Server 2008, MVP SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
We walk in the dark places no others will enter We stand on the bridge and no one may pass
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: 2 days ago @ 12:47 PM
Points: 100,
Visits: 346
|
|
Are you using Snap Manager for SQL? If so, I'd suggest trying to do a native backup to see if it's NetApp that's misinformed regarding the state of that block (it did crash, after all) or whether there's truly something wrong from SQL's perspective. If the native backup works, try restoring it somewhere else (same instance would be fine, but I'd suggest a different LUN, on different drives if possible). If *that* works, I would try for a cutover to that new db (after restoring again and applying logs as necessary).
Good luck. I've dealt with NetApp and SQL Server and will say that it has a bit of a Pinocchio syndrome: when it's good, it's very good. But when it's bad...
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Yesterday @ 6:43 AM
Points: 42,
Visits: 212
|
|
| Unfortunately is the native SQL back up that is failing.
|
|
|
|