Repair Corruption Using the Mirror Database in SQL 2005

  • Comments posted to this topic are about the item Repair Corruption Using the Mirror Database in SQL 2005

  • Good article.

    In problem resolution I often find techniques that work on small databases do not scale that well to VLDBs.

    Another solution could have been to simply update (on the principal) the rows that were corrupted? Update in a manner that doesn't really change the data, and then update again to remove the change. Or alternatively, just script out the in the row, delete it and re-insert.

  • Hmm - there are some nasty potential problems with doing what you describe and I would not recommend it in production:

    1) what if the clustered index being rebuilt is very large? How to cope with the resulting potential backlog of transactions on the principal, and probable large REDO queue on the mirror? What about the transasction log growth on the principal from having to cope with the fully-logged index rebuild? What about the knock-on effect on log shipping, transactional replication, etc?

    2) what if the I/O subsystem on the (new) mirror is damaged and the rebuild cannot be replayed? What do you suggest as the way forward if the mirror stops with a failure during replay of one of the log records from the index rebuild?

    And apart from that, you don't go into details of how to make sure the problem won't happen again (i.e. root cause analysis of the original failure).

    Depending on database size and network bandwidth, my recommendation may be to break the mirroring partnership, do root-cause analysis to make sure the I/O subsystem on the old principal is sound, and then re-initialize the partnership.

    It's a neat idea that you're proposing, but you need to think through all the consequences and potentialities for VLDBs and for further failures before recommending to others.

    Thanks

    Paul Randal
    CEO, SQLskills.com: Check out SQLskills online training!
    Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
    SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
    Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

  • Paul Randal (5/9/2010)


    2) what if the I/O subsystem on the (new) mirror is damaged and the rebuild cannot be replayed? What do you suggest as the way forward if the mirror stops with a failure during replay of one of the log records from the index rebuild?

    Would that send the mirror suspect?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Paul Randal (5/9/2010)


    Hmm - there are some nasty potential problems with doing what you describe and I would not recommend it in production:

    1) what if the clustered index being rebuilt is very large? How to cope with the resulting potential backlog of transactions on the principal, and probable large REDO queue on the mirror? What about the transasction log growth on the principal from having to cope with the fully-logged index rebuild? What about the knock-on effect on log shipping, transactional replication, etc?

    2) what if the I/O subsystem on the (new) mirror is damaged and the rebuild cannot be replayed? What do you suggest as the way forward if the mirror stops with a failure during replay of one of the log records from the index rebuild?

    And apart from that, you don't go into details of how to make sure the problem won't happen again (i.e. root cause analysis of the original failure).

    Depending on database size and network bandwidth, my recommendation may be to break the mirroring partnership, do root-cause analysis to make sure the I/O subsystem on the old principal is sound, and then re-initialize the partnership.

    It's a neat idea that you're proposing, but you need to think through all the consequences and potentialities for VLDBs and for further failures before recommending to others.

    Thanks

    if you have limited bandwidth and a VLDB then running a backup over the WAN and restoring it along with all the transaction logs would be very time consuming. and from a DR perspective dangerous since you won't have a copy of the data in a DR location during the process.

    there are risks with this solution, but in a lot of environments they are probably worth it compared to breaking your DR process and reinitializing it. and working around business hours for the backup/restore process

    on some of our larger tables in the 200 million row range if we had to do this we would drop all indexes, rebuild the clustered and then the other indexes. it would take 10-20 minutes per index. maybe 60 for the clustered index.

  • @Gail No

    @alex Yes, which is why I said it would depend on database size and network bandwidth. My point was that the potential risks need to be understood before doing this. And you're missing the point about dropping and rebuilding indexes with synchronous mirroring running - essentially all the new indexes would be sent across the wire to the mirror - that may be just as much data, and much slower than reinitializing from a backup.

    Paul Randal
    CEO, SQLskills.com: Check out SQLskills online training!
    Blog:www.SQLskills.com/blogs/paul Twitter: @PaulRandal
    SQL MVP, Microsoft RD, Contributing Editor of TechNet Magazine
    Author of DBCC CHECKDB/repair (and other Storage Engine) code of SQL Server 2005

  • I liked the proposed idea to fix the corruption. I would have liked to see more info and some precautions thrown into the article as well. Paul brings up some great concerns. If these had been addressed in the article, I think the article would have been much better. Even a disclaimer in the article stating that one should consider these types of questions before fixing the problem using this method.

    Thanks for sharing this info.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • I agree that I should have included a disclaimer about using this method. However, my intent was to share my experience and a unique solution to a corruption issue. The intent was not to present a cure all solution.

    The root cause analysis was done by the storage team along with our storage vendors. Since this is outside my expertise I can only give the 30,000 foot view. It basically boiled down to a disk failure that was not handled correctly by the SAN causing the disk controller to freeze.

    Rebuilding the 190GB clustered index on a single table was a faster solution (2 hrs) in this particular case than rebuilding mirroring for a 3TB database. As Paul indicated there was a large redo queue on the mirror side. This solution was tested in our QA system to estimate timing and was implemented during a quiet system time.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply