SQL 2012 - Mirroring - Performance issue after failover

  • Hello, I encountered a serious performance issue after failing over in production today. I would love any insights or ideas you can share from your experience in similar situations.

    Context:

    - Running SQL Server 2012 R2

    - Configured with full safety

    - Configured with Mirroring & automatic failover

    - Identical hardware between principal and mirror

    Initial incident: Connectivity issue between clients and the principal database forced us to do a manual failover to the mirror.

    Followup incident: After manual fail-over, performance on the mirror db was abysmal:

    - 100% CPU

    - 1100 threads

    - Web site response time was very slow / many queries timing out

    - etc.

    At this time, the incoming load was no different on the mirror than what is 'typical' for the principal. We were able to resolve the problem by manually turning services off on our web frontends to reduce incoming traffic, then gradually throttling back on. Proceeding this way, the database was able to handle ongoing load at its more typical 60% CPU threshold.

    Have others experienced this type of problem after failing over when the database is under peak load? Is it a best practice to gradually throttle load after failing over? If so, have others found tools to do this within SQL, or only upstream of SQL?

    Thanks!

  • What form of troubleshooting did you do on the database server itself??

    1) use sp_whoisactive to find causes of calls not completing in a timely manner (locking/blocking, system processes running, query plans, tempdb usage, waits for non-running queries, etc)

    2) do a differential waits analysis

    3) do a file IO stall analysis

    Every client I every engage with is taught to do those 3 things immediately and without thought when a SQL Server performance problem is noted. The VAST majority of the time one or more of them will point to a root cause of the current issue. A key point is that SQL Server perf issues are often TRANSIENT in nature and if you don't capture metrics ASAP you may not be able to identify a root cause.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Oh, the MOST obvious possible cause: Your Secondary is a fraction of the power (CPU/RAM/IO/etc) that your Primary is!! :hehe:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (4/28/2015)


    Oh, the MOST obvious possible cause: Your Secondary is a fraction of the power (CPU/RAM/IO/etc) that your Primary is!! :hehe:

    From the original post:

    - Identical hardware between principal and mirror

    I would look at the power settings in the BIOS between the two servers.

  • Lynn Pettis (4/28/2015)


    TheSQLGuru (4/28/2015)


    Oh, the MOST obvious possible cause: Your Secondary is a fraction of the power (CPU/RAM/IO/etc) that your Primary is!! :hehe:

    From the original post:

    - Identical hardware between principal and mirror

    I would look at the power settings in the BIOS between the two servers.

    GAH!! Completely missed that! Good catch. :blink:

    Windows Server also defaults to Balanced Power mode, which can be a big throttle.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks for the responses. These instances are actually running in AWS, they are identically configured.

    Kevin, thanks for your tips on performance analysis. Are you referring to something like:

    differential waits analysis: http://www.sqlservercentral.com/articles/baselines/96270/

    file IO stall analysis: http://sqlmag.com/blog/tracking-live-sql-server-file-stalls

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply