Home Forums SQL Server 2012 SQL 2012 - General Mirroring monitor:" "current restore rate" low (100 KB/s) while "unrestored log" enourmous (100's GB) RE: Mirroring monitor:" "current restore rate" low (100 KB/s) while "unrestored log" enourmous (100's GB)

  • Theo Ekelmans (12/27/2013)


    hi.

    I have a strange SQL 2012 mirror issue, on a rather large HP DL980 server (32 CPU's, 256 GB memory), on which I mirror about 40 DB's.

    38 of them work just fine, but for some reason the mirror just cant keep up with 2 DB's.

    The disks should be able to cope with the load, since the work on the server is well spread over several disks;

    2 Fusion IO's (PROD DB and temp),

    25 15K SAS disks raid-10 6TB array (ACC DB's)

    25 10K SATA disks raid-5 18TB array (DWH and backups, budget choiche)

    I know that he last is a slower setup, but it's just used for backup and DWH databases, and two of those DB's happen to be my problem DB's; 700 GB and 170 GB.

    There is no excessive queuing on the disks; i can still copy a backup on the same disk from folder to folder and do >80MB/sec, hence; there is enough disk IO left to process the mirror logs faster then the current 100KB/sec'ish speeds.

    CPU load over all CPU is low on the mirror, about 4% average, but i can see 2 CPU's running at 100% load.... hmm... 2 problem DB's, 2 CPU's at 100%.... suspicious..... 😎

    I'm, nowhere near the limits of mirroring and the number of working threads is no issue either, i even tried to increase it to 1500, just to sure. (http://support.microsoft.com/kb/2001270)

    So I have no clue why those 2 DB's are running so far behind in the log restores.

    Does anyone have any ideas, or hints?

    Theo (NL)

    From what I understand you get ONE thread per database for the tlog read/send on the primary side and ONE thread per database for tlog hardening/replay on the secondary side. My experience and in talking with others is that you get a maximum of maybe 50MB/sec of activity with that one thread, and that is with FusionIO on both ends.

    I assume you know it is those two databases that are slow? Given the discussion above I am not sure that you do. It isn't the SIZE of a database that determine's how much mirroring load will be needed - it is the volume of transactions and the amount of tlog generated for those transactions that matter. I can take a 100MB database and saturate mirroring with my laptop generating load.

    I would do a profiler trace to see what activity is happening on each primary database so you can find the heavy hitters. I do note that ETL processes on a DWH can EASILY blow out tlog volumes, especially if you do an index defrag process after a load. I have had to change a number of clients over to tlog shipping from mirroring on their EDWs because of that.

    I would also do file IO stall and wait stats analysis on the secondary, as well as run some DMVs to find out what activity is consuming those 2 CPUs. Shouldn't be hard to find the culprits.

    Without detailed data you are simply hunting and pecking and guessing. Not a good way to troubleshoot ... 😉

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service