How to deal with Mirroring alerts (and prevent disaster)?

  • Hi All,

    I've seen a lot of postings about setting up alerts on mirroring, but now that's done... how to deal with them?

    I have set up mirroring for a customer for which I do remote administration some years ago: 1 database > Principal > Mirror including Whitness for automatic failover.

    DB size is about 800+ GB, it is used for marketing purposes by 10 webservers to spread the internet traffic.

    Servers have 16 cores, 16GB memory and normally they seem to handle high volumes of transactions just fine, but sometimes it goes wrong...

    In november 2014 I set up relevant alerts and since then I frequently receive them through mail.

    Most frequent one is like:

    DESCRIPTION:The SQL Server performance counter 'Transaction Delay' (instance '_Total') of object 'SQLServer:Database Mirroring' is now above the threshold of 550.00 (the current value is 8872.35

    I get this e few days per week, last week up to 20 times per day.

    This is telling me that the Principal is waiting for the transactions to 'land' in the Mirror, and so are all clients...

    Other alerts I receive, but far less frequent:

    - Principal - Log Send Queue KB (treshold 0), few times per month

    - Mirror - Redo Queue KB (treshhold 250), once per month or less

    Of course it would be easy to switch to asynchronous Mirroring instead of synchronous, but the automatic failover was just the reason for choosing the current setup. Besides: in my opinion this should work.

    I cant however find clues in the SQL Server and Windows logs about a possible cause, but I experienced 2 cases when this alert started hitting continuously (mails every minute), the Mirror 'hanged' and so did the Principal for a period of minutes (up to 30!) so there's a new chain reaction:

    Users of the websites start calling my customers customer, customers customer calls customer and eventually customer calls me. Of course I already saw the rain of mails but I was somewhere else on location or even better: on holidays 🙂

    A restart of the Mirror SQL Server Service solved both cases...

    First that came in mind was: with Mirroring the availability should go up, but even when the Principle itself looks fine, it hangs because OF the Mirroring.

    Second came the question: what is causing this?

    I 'feel' that it has to do with networking issues but I cannot find facts for that.

    What makes it harder for me is the fact that most of the time the alerts rise, I am busy at other locations and so, not able to logon at the remote site.

    My customer knows that, the support and price are (a.o.) based on my limited availability.

    But... I still see it as my job to prevent this from happening whenever possible, but how???

    Anybody ideas how to tackle this?

Viewing 0 posts

You must be logged in to reply to this topic. Login to reply