"Backup detected log corruption in database. Context is Bad End Sector"

  • Two clustered instances, one SQL2008 SP2, one SQL2008 R2, both currently inhabiting on the same node on the cluster. SAN attached.

    At the same time, both reported transaction log backup errors:

    On named instance N2:

    Backup detected log corruption in database MyN2DB. Context is Bad End Sector. LogFile: 2 'K:\MSSQL10.N2\MSSQL\Logs\MyN2DB_log.LDF' VLF SeqNo: x5513 VLFBase: x91fb0000 LogBlockOffset: x937ae000 SectorStatus: 4 LogBlock.StartLsn.SeqNo: x5513 LogBlock.StartLsn.Blk: xbfef Size: x400 PrevSize: x400

    On named instance N4:

    Backup detected log corruption in database MyN4DB. Context is Bad End Sector. LogFile: 2 'O:\MSSQL10_50.N4\MSSQL\Data\MyN4DB_log.LDF' VLF SeqNo: xadb VLFBase: x11000000 LogBlockOffset: x1143d200 SectorStatus: 4 LogBlock.StartLsn.SeqNo: xadb LogBlock.StartLsn.Blk: x21c6 Size: x4800 PrevSize: x3200

    DBCC returns nothing (which apparently is normal according to this blog post by Paul Randall)

    Log backups since then have been running successfully, as have full backups.

    From what I have always read, here at SSC in posts related to corruption, is that corruption is most likely due to IO subsystem issues. Is there any more information I can give our SAN guys from the error messages above? Can anyone explain to me a little about what is going on here? Google isn't helping me with this (or more likely I don't know the best terms to google)

    The logs for these two instances each have their own drive - would this double corruption occurrence at exactly the same time suggest their LUNs are somehow connected? I'm pretty new to dealing with hardware, I apologise if this is a stupid question.

    As for the solution to this, I understand from Gail Shaw's post here the steps to go through to correct this, but if it's an underlying IO subsystem error, that's not going to go away, is it? Until the SAN guys fix the underlying error, what can I do to keep the data safe? And can I trust my full backups as things stand now?

    And another newbie question - Given the instances are on a cluster, is there any point failing the databases over to the other node? Presumably a shared drive is just a shared drive and would be the same on either node?

    Many thanks for your time

    ------------------------------------------------------------------------
    Bite-sized fiction (with added teeth) [/url]

  • What are you using to do log backups? Are those native backups or some 3rd party tool?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • These are native backups. DPM is also used, although for some reason that is not part of the DBA team's responsibilities so I don't know much about them

    ------------------------------------------------------------------------
    Bite-sized fiction (with added teeth) [/url]

  • Not an error I'm familiar with. Let me consult an expert.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Gail, much appreciated

    ------------------------------------------------------------------------
    Bite-sized fiction (with added teeth) [/url]

  • Gail asked me to offer advice.

    It's log corruption on disk (or potentially from a memory scribbler - hopefully the former).

    If I was on the system, I'd do the following to allow log backups to continue - switch to SIMPLE, checkpoint to clear out the log, switch back to FULL, take a full backup to restart the log backup chain. You may run into difficulties if the log corruption is pervasive, but this would be the first course of action I'd try. I'd also run full IO subsystem and memory diagnostics, and probably sqliosim on the IO subsystem too.

    Hope this helps.

    Paul Randal
    CEO, SQLskills.com: Check out SQLskills online training!
    Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
    SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
    Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

  • Thanks Paul

    I'll run the FULL>SIMPLE>FULL switch and hopefully set the ball rolling on the rest of it

    ------------------------------------------------------------------------
    Bite-sized fiction (with added teeth) [/url]

  • Another question, if I may - the logs of those databases have been successfully backing up since the failed one.

    If they are still corrupt, how come all the rest of the log backups aren't throwing the same error?

    ------------------------------------------------------------------------
    Bite-sized fiction (with added teeth) [/url]

  • The corruption was transient - i.e. it was the SAN returning bad data when the data was really good on disk. Subsequent read worked. This is not uncommon.

    Paul Randal
    CEO, SQLskills.com: Check out SQLskills online training!
    Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
    SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
    Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

  • Thanks again Paul.

    There's a lot more reading that I need to do...

    ------------------------------------------------------------------------
    Bite-sized fiction (with added teeth) [/url]

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply