• GilaMonster (7/29/2014)


    Start with regularly tested backups and frequent database integrity checks. You should be running integrity checks often enough that it's caught before ending up in the backups. Or, if there isn't a large enough maintenance window, restore the backups to another server and CheckDB that.

    Second, brush up on restore options. A 1TB database is probably in multiple files/filegroups. If the corruption was isolated, you should have been able to restore just the affected file/filegroup. Usually a lot faster than restoring the entire DB and if you have SQL enterprise edition, the restore is 'online' (rest of the database is online and usable). Depending on the extent of the damage, you may have been able to do page restores, restoring just the corrupted pages.

    Once that's in place, then start looking at HA/DR that support your SLAs.

    Thanks Gila, as this is a third-party application there is only 1 filegroup (PRIMARY), and it contains 10 years worth of data, therefore it's size. I will brush up on restore options however.

    Also, as you mentioned, I also have asked our team to investigate the feasibility of increasing our checkdb from weekly to daily (this one takes ~4hrs to run), but we have 100+ other SQL servers that are smaller in size and could run in an hour or less.

    Also, the extent of the damage was pretty bad - over 3500+ consistency errors spread amongst 10 tables. When we finally recovered , I compared my checktable pagecount to the checkdb pagecount initially for those 10 tables and the difference was about 36k pages difference in the restored db versus the corrupted database (so I assume if I did repair_allow_data_loss option I may have lost 36k pages!).