SQL Database corruption issue

  • We are suddenly getting a lot of errors in a 160GB database along the lines of SQL Server detected a logical consistency-based I/O error

    This seems to have started happening around 02:00 on 14/08/17 when a 'Rebuild Index' job failed


    Date                      14/08/2017 02:00:00
    Log                         Job History (Epro Index Rebuilding)

    Step ID                 1
    Server                   EPROLIVESQL
    Job Name                            Epro Index Rebuilding
    Step Name                         Rebuild Indexes
    Duration                              00:11:46
    Sql Severity        24
    Sql Message ID 3621
    Operator Emailed           
    Operator Net sent          
    Operator Paged               
    Retries Attempted          0

    Message
    Executed as user: NT AUTHORITY\SYSTEM. SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:20529816; actual 0:0). It occurred during a read of page (1:20529816) in database ID 7 at offset 0x00002728530000 in file 'F:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\EproPat.mdf'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online. [SQLSTATE HY000] (Error 824)  Executing ALTER INDEX ux_patients_history_versions ON dbo.t_patients_history REBUILD                                  - fragmentation currently 30.2339% [SQLSTATE 01000] (Error 0)  Executing ALTER INDEX in_clinic_appointments_patient ON dbo.t_clinic_appointments REBUILD                              - fragmentation currently 29.7339% [SQLSTATE 01000] (Error 0)  Executing ALTER INDEX ux_patients_history_previousVersions ON dbo.t_patients_history REBUILD                               - fragmentation currently 29.0101% [SQLSTATE 01000] (Error 0)  The statement has been terminated. [SQLSTATE 01000] (Error 3621).  The step failed.

    I've imported the latest backup onto a test SQL instance and run dbcc_checkdb and attached the results of that.

    Not sure what has caused this, or if I can repair it easily!

  • I have also just done a REPAIR_ALLOW_DATA_LOSS on the test instance of the DB and it came back with the attached results...but I can't tell if any actual user data was lost as part of this repair?

  • Yes. Data was lost.

    Repairing, especially with allow_data_loss, is a last resort for when there's no other way of fixing things. Best method is to restore from a clean backup. With proper backup strategy, you won't lose any data.
    When did CheckDB last run without error?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • This was removed by the editor as SPAM

  • Many thanks for the replies,

    We do have good backups of the database and Full recovery mode on the DB so hourly transaction logs are generated, so is it a case of restoring the last known good 'full' database backup then restore all the transaction log files after this point up until the most recent in time? (Not had to do something like this in a very long time!) 🙂

  • Yes, if you have a good full backup (you may want to restore that full on your test server and run DBCC CheckDB to ensure it's clean), start with restoring that, with NORECOVERY, then (assuming you don't have any differential backups) restore the transaction log backups up to the point where you're sure the database is consistent. The best approach is going to be time consuming, but worth it...you'll want to restore the full backup with RECOVERY, run DBCC CheckDB. Then, drop the database, restore again with NORECOVERY, restore the first t-log backup with RECOVERY, run DBCC CheckDB. Drop database...and repeat until you get to the point where DBCC CheckDB encounters a consistency or allocation error...I think it indicates a time...or you can look it up when it displays the page where the problem is, and then you know where to stop the recovery at...the transaction log either before the problem occurs, or the transaction log where the problem occurs, up to a time before the problem occurs.
    The restore options (and how to deal with DBCC CheckDB errors when encountered) can be easily found online.

    Alan H
    MCSE - Data Management and Analytics
    Senior SQL Server DBA

    Best way to ask a question: http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • john.round - Friday, August 18, 2017 2:47 AM

    Many thanks for the replies,

    We do have good backups of the database and Full recovery mode on the DB so hourly transaction logs are generated, so is it a case of restoring the last known good 'full' database backup then restore all the transaction log files after this point up until the most recent in time? (Not had to do something like this in a very long time!) 🙂

    Yes, that's it:
    1. Restore from last known 'good' backup.
    2. Restore every transaction log backup taken since then, including taking log backups from the corrupted database before shutting it down.

    Please continue taking transaction log backups on the damaged database while you prepare your restore. It has suffered significant loss, but you can recover everything, including capturing all changes made to the database, if you keep taking log backups.

    Disk corruption on  data files (at least the kind shown in your CHECKDB output) is not written to the transaction log, unless the log file got damaged in the incident. Because the transaction log does not contain the instructions on how to cause the damage to the data files, restoring from a full backup taken before the incident and using log backups from before and after the incident won't include the damage.
    If there was damage, but it occurred to an area of the log that had already been backed up, there will be no data loss.

    If the event also wiped your backups, then you're pretty much out of options.

    Eddie Wuerch
    MCM: SQL

  • john.round - Friday, August 18, 2017 2:47 AM

    Many thanks for the replies,

    We do have good backups of the database and Full recovery mode on the DB so hourly transaction logs are generated, so is it a case of restoring the last known good 'full' database backup then restore all the transaction log files after this point up until the most recent in time? (Not had to do something like this in a very long time!) 🙂

    Yup, that's pretty much that. Take a final log backup before you start, restore that as the final log restore.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • SQL_Hacker - Friday, August 18, 2017 1:24 PM

     The best approach is going to be time consuming, but worth it...you'll want to restore the full backup with RECOVERY, run DBCC CheckDB. Then, drop the database, restore again with NORECOVERY, restore the first t-log backup with RECOVERY, run DBCC CheckDB. Drop database...and repeat until you get to the point where DBCC CheckDB encounters a consistency or allocation error...I think it indicates a time...or you can look it up when it displays the page where the problem is, and then you know where to stop the recovery at...the transaction log either before the problem occurs, or the transaction log where the problem occurs, up to a time before the problem occurs.

    There's no need to do that.
    That kind of 'creep forward' is needed for recovering from data loss (dropped table, accidental delete, etc). Corruption, in the vast majority of cases you can just restore all log backups up to most recent, because a misbehaving IO-subsystem (most common cause of corruption) doesn't write transaction log records for what it mangled

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Gail. I obviously didn't read the original question well enough. Thanks for catching that.

    Alan H
    MCSE - Data Management and Analytics
    Senior SQL Server DBA

    Best way to ask a question: http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • I've had to go back a while to the full database backup at 08/08/17 00:48 which passes DBCC CheckDB with no consistency errors.

    So just to confirm the first log after that is at 01:00, then on the hour every hour up until the most recent (at time of post 14:00, then the tail log).  I am assuming all the daily full database backups taken from 09/08/17 onwards at 00:48 won't have any impact on the transaction log restores? (Hopefully that makes sense!)

  • john.round - Monday, August 21, 2017 7:45 AM

    So just to confirm the first log after that is at 01:00, then on the hour every hour up until the most recent (at time of post 14:00, then the tail log).  I am assuming all the daily full database backups taken from 09/08/17 onwards at 00:48 won't have any impact on the transaction log restores? (Hopefully that makes sense!)

    Correct.

    I would recommend, once this is fixed, scheduling more frequent checkDB runs. At least once a week is my preference.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • So, I successfully tested yesterday on a test instance using the following script (as there are over 250 transaction log files!)

    RESTORE DATABASE [EproPat] FROM DISK = 'D:\tmp\EproPat_backup_2017_08_08_000106_1477461.bak' WITH NORECOVERY, REPLACE
    RESTORE LOG [EproPat] FROM DISK = 'D:\tmp\EproPat_backup_2017_08_08_010007_1198079.trn' WITH NORECOVERY
    RESTORE LOG [EproPat] FROM DISK = 'D:\tmp\EproPat_backup_2017_08_08_020006_9747574.trn' WITH NORECOVERY
    RESTORE LOG [EproPat] FROM DISK = 'D:\tmp\EproPat_backup_2017_08_08_030005_2635503.trn' WITH NORECOVERY
    (and so on for all the other transaction logs up until the latest one)
    RESTORE DATABASE [EproPat] WITH RECOVERY

    So, on the live system if I wait for the, for example, midnight (00:00) transaction log backup to complete then kick off the restore script at 00:01 there is no need for a 'tail log' backup to take place?  (I choose midnight as an example as nobody would be using the system at this time!) then once the database was up and running again it would be consistent as of the latest 00:00 transaction log?

  • Tail log is to ensure that absolutely nothing is missed, taking it switches the source DB into RESTORING state, so that nothing more can be done to it.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • This was removed by the editor as SPAM

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply