SQL Database corruption issue

Question

SQL Database corruption issue

john.round

Old Hand

Points: 372
More actions
August 17, 2017 at 8:46 am

#338797

We are suddenly getting a lot of errors in a 160GB database along the lines of SQL Server detected a logical consistency-based I/O error
This seems to have started happening around 02:00 on 14/08/17 when a 'Rebuild Index' job failed
Date                      14/08/2017 02:00:00 Log                         Job History (Epro Index Rebuilding)
Step ID                 1 Server                   EPROLIVESQL Job Name                            Epro Index Rebuilding Step Name                         Rebuild Indexes Duration                              00:11:46 Sql Severity        24 Sql Message ID 3621 Operator Emailed            Operator Net sent           Operator Paged                Retries Attempted          0
Message Executed as user: NT AUTHORITY\SYSTEM. SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:20529816; actual 0:0). It occurred during a read of page (1:20529816) in database ID 7 at offset 0x00002728530000 in file 'F:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\EproPat.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online. [SQLSTATE HY000] (Error 824) Executing ALTER INDEX ux_patients_history_versions ON dbo.t_patients_history REBUILD                                 - fragmentation currently 30.2339% [SQLSTATE 01000] (Error 0) Executing ALTER INDEX in_clinic_appointments_patient ON dbo.t_clinic_appointments REBUILD                             - fragmentation currently 29.7339% [SQLSTATE 01000] (Error 0) Executing ALTER INDEX ux_patients_history_previousVersions ON dbo.t_patients_history REBUILD                              - fragmentation currently 29.0101% [SQLSTATE 01000] (Error 0) The statement has been terminated. [SQLSTATE 01000] (Error 3621). The step failed.
I've imported the latest backup onto a test SQL instance and run dbcc_checkdb and attached the results of that.
Not sure what has caused this, or if I can repair it easily!

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

john.round Old Hand Points: 372 More actions · Answer 1

I have also just done a REPAIR_ALLOW_DATA_LOSS on the test instance of the DB and it came back with the attached results...but I can't tell if any actual user data was lost as part of this repair?

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 2

Yes. Data was lost.

Repairing, especially with allow_data_loss, is a last resort for when there's no other way of fixing things. Best method is to restore from a clean backup. With proper backup strategy, you won't lose any data.
When did CheckDB last run without error?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

john.round Old Hand Points: 372 More actions · Answer 3

Many thanks for the replies,

We do have good backups of the database and Full recovery mode on the DB so hourly transaction logs are generated, so is it a case of restoring the last known good 'full' database backup then restore all the transaction log files after this point up until the most recent in time? (Not had to do something like this in a very long time!) 🙂

SQL_Hacker SSCommitted Points: 1728 More actions · Answer 4

Yes, if you have a good full backup (you may want to restore that full on your test server and run DBCC CheckDB to ensure it's clean), start with restoring that, with NORECOVERY, then (assuming you don't have any differential backups) restore the transaction log backups up to the point where you're sure the database is consistent. The best approach is going to be time consuming, but worth it...you'll want to restore the full backup with RECOVERY, run DBCC CheckDB. Then, drop the database, restore again with NORECOVERY, restore the first t-log backup with RECOVERY, run DBCC CheckDB. Drop database...and repeat until you get to the point where DBCC CheckDB encounters a consistency or allocation error...I think it indicates a time...or you can look it up when it displays the page where the problem is, and then you know where to stop the recovery at...the transaction log either before the problem occurs, or the transaction log where the problem occurs, up to a time before the problem occurs.
The restore options (and how to deal with DBCC CheckDB errors when encountered) can be easily found online.

Alan H
MCSE - Data Management and Analytics
Senior SQL Server DBA
Best way to ask a question: http://www.sqlservercentral.com/articles/Best+Practices/61537/

Eddie Wuerch SSChampion Points: 12542 More actions · Answer 5

john.round - Friday, August 18, 2017 2:47 AM
Many thanks for the replies,
We do have good backups of the database and Full recovery mode on the DB so hourly transaction logs are generated, so is it a case of restoring the last known good 'full' database backup then restore all the transaction log files after this point up until the most recent in time? (Not had to do something like this in a very long time!) 🙂

Yes, that's it:
1. Restore from last known 'good' backup.
2. Restore every transaction log backup taken since then, including taking log backups from the corrupted database before shutting it down.

Please continue taking transaction log backups on the damaged database while you prepare your restore. It has suffered significant loss, but you can recover everything, including capturing all changes made to the database, if you keep taking log backups.

Disk corruption on data files (at least the kind shown in your CHECKDB output) is not written to the transaction log, unless the log file got damaged in the incident. Because the transaction log does not contain the instructions on how to cause the damage to the data files, restoring from a full backup taken before the incident and using log backups from before and after the incident won't include the damage.
If there was damage, but it occurred to an area of the log that had already been backed up, there will be no data loss.

If the event also wiped your backups, then you're pretty much out of options.

Eddie Wuerch
MCM: SQL

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 6

john.round - Friday, August 18, 2017 2:47 AM
Many thanks for the replies,
We do have good backups of the database and Full recovery mode on the DB so hourly transaction logs are generated, so is it a case of restoring the last known good 'full' database backup then restore all the transaction log files after this point up until the most recent in time? (Not had to do something like this in a very long time!) 🙂

Yup, that's pretty much that. Take a final log backup before you start, restore that as the final log restore.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 7

SQL_Hacker - Friday, August 18, 2017 1:24 PM
The best approach is going to be time consuming, but worth it...you'll want to restore the full backup with RECOVERY, run DBCC CheckDB. Then, drop the database, restore again with NORECOVERY, restore the first t-log backup with RECOVERY, run DBCC CheckDB. Drop database...and repeat until you get to the point where DBCC CheckDB encounters a consistency or allocation error...I think it indicates a time...or you can look it up when it displays the page where the problem is, and then you know where to stop the recovery at...the transaction log either before the problem occurs, or the transaction log where the problem occurs, up to a time before the problem occurs.

There's no need to do that.
That kind of 'creep forward' is needed for recovering from data loss (dropped table, accidental delete, etc). Corruption, in the vast majority of cases you can just restore all log backups up to most recent, because a misbehaving IO-subsystem (most common cause of corruption) doesn't write transaction log records for what it mangled

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

SQL_Hacker SSCommitted Points: 1728 More actions · Answer 8

Thanks Gail. I obviously didn't read the original question well enough. Thanks for catching that.

Alan H
MCSE - Data Management and Analytics
Senior SQL Server DBA
Best way to ask a question: http://www.sqlservercentral.com/articles/Best+Practices/61537/

john.round Old Hand Points: 372 More actions · Answer 9

I've had to go back a while to the full database backup at 08/08/17 00:48 which passes DBCC CheckDB with no consistency errors.

So just to confirm the first log after that is at 01:00, then on the hour every hour up until the most recent (at time of post 14:00, then the tail log). I am assuming all the daily full database backups taken from 09/08/17 onwards at 00:48 won't have any impact on the transaction log restores? (Hopefully that makes sense!)

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 10

john.round - Monday, August 21, 2017 7:45 AM
So just to confirm the first log after that is at 01:00, then on the hour every hour up until the most recent (at time of post 14:00, then the tail log). I am assuming all the daily full database backups taken from 09/08/17 onwards at 00:48 won't have any impact on the transaction log restores? (Hopefully that makes sense!)

Correct.

I would recommend, once this is fixed, scheduling more frequent checkDB runs. At least once a week is my preference.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

john.round Old Hand Points: 372 More actions · Answer 11

So, I successfully tested yesterday on a test instance using the following script (as there are over 250 transaction log files!)

RESTORE DATABASE [EproPat] FROM DISK = 'D:\tmp\EproPat_backup_2017_08_08_000106_1477461.bak' WITH NORECOVERY, REPLACE
RESTORE LOG [EproPat] FROM DISK = 'D:\tmp\EproPat_backup_2017_08_08_010007_1198079.trn' WITH NORECOVERY
RESTORE LOG [EproPat] FROM DISK = 'D:\tmp\EproPat_backup_2017_08_08_020006_9747574.trn' WITH NORECOVERY
RESTORE LOG [EproPat] FROM DISK = 'D:\tmp\EproPat_backup_2017_08_08_030005_2635503.trn' WITH NORECOVERY
(and so on for all the other transaction logs up until the latest one)
RESTORE DATABASE [EproPat] WITH RECOVERY

So, on the live system if I wait for the, for example, midnight (00:00) transaction log backup to complete then kick off the restore script at 00:01 there is no need for a 'tail log' backup to take place? (I choose midnight as an example as nobody would be using the system at this time!) then once the database was up and running again it would be consistent as of the latest 00:00 transaction log?

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 12

Tail log is to ensure that absolutely nothing is missed, taking it switches the source DB into RESTORING state, so that nothing more can be done to it.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass