Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

'Best' HA/DR option to protect against data corruption Expand / Collapse
Author
Message
Posted Tuesday, July 29, 2014 10:02 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, October 20, 2014 1:04 PM
Points: 25, Visits: 161
GilaMonster (7/29/2014)
Start with regularly tested backups and frequent database integrity checks. You should be running integrity checks often enough that it's caught before ending up in the backups. Or, if there isn't a large enough maintenance window, restore the backups to another server and CheckDB that.

Second, brush up on restore options. A 1TB database is probably in multiple files/filegroups. If the corruption was isolated, you should have been able to restore just the affected file/filegroup. Usually a lot faster than restoring the entire DB and if you have SQL enterprise edition, the restore is 'online' (rest of the database is online and usable). Depending on the extent of the damage, you may have been able to do page restores, restoring just the corrupted pages.

Once that's in place, then start looking at HA/DR that support your SLAs.


Thanks Gila, as this is a third-party application there is only 1 filegroup (PRIMARY), and it contains 10 years worth of data, therefore it's size. I will brush up on restore options however.

Also, as you mentioned, I also have asked our team to investigate the feasibility of increasing our checkdb from weekly to daily (this one takes ~4hrs to run), but we have 100+ other SQL servers that are smaller in size and could run in an hour or less.

Also, the extent of the damage was pretty bad - over 3500+ consistency errors spread amongst 10 tables. When we finally recovered , I compared my checktable pagecount to the checkdb pagecount initially for those 10 tables and the difference was about 36k pages difference in the restored db versus the corrupted database (so I assume if I did repair_allow_data_loss option I may have lost 36k pages!).
Post #1597380
Posted Tuesday, July 29, 2014 10:17 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 5:47 AM
Points: 40,207, Visits: 36,614
I said nothing implying you should have repaired...

See if you can get some automated restore tests set up. Restore backups on another server, run CheckDB. Off the prod server means it doesn't matter how long they take, does mean you will know if your backups are good.

See if there's any way you can archive older data into tables on another filegroup or another database entirely. Generally with a DB like that, the active portion is tiny compared to the entire DB.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1597385
Posted Tuesday, July 29, 2014 10:23 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, October 20, 2014 1:04 PM
Points: 25, Visits: 161
GilaMonster (7/29/2014)
I said nothing implying you should have repaired...

See if you can get some automated restore tests set up. Restore backups on another server, run CheckDB. Off the prod server means it doesn't matter how long they take, does mean you will know if your backups are good.

See if there's any way you can archive older data into tables on another filegroup or another database entirely. Generally with a DB like that, the active portion is tiny compared to the entire DB.


Ah I should have mentioned we had multiple 'solutions' occurring in parallel and repair_allow_data_loss was one of them as an absolute last resort.

And yes, I managed to archive one table (yippee) into another database last year but their biggest table, an audit table, taking 300GB out of the 1TB, cannot be touched according to the business, they want all history in there..and accessible if needed..ugh.

And yes (again), doing more frequent checkdbs is definitely being looked at..thanks again everyone.
Post #1597391
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse