Poor SQL Performance

  • I'm hoping some folks might be able to help out with a poor performance situation for our SQL server.

    We recently did a major system upgrade from SQL 2008 and Windows 2008 and Xenserver to:

    SQL 2012 R2 (build 11.0.6020) 21GB max memory, 24GB total

    Windows Server 2012 R2

    Xenserver 6.5 (build 6.5.2.2477)

    After upgrading we are getting user complaints and noticeable slowness\time outs at our more peak times of use. Looking at DMV's most of our wait times center around the following:

    BACKUPIO

    WRITELOG

    ASYNC IO COMPLETION

    Looking at the virtual file stats DMV I am noticing read and write latency across multiple files:

    Master DB Log File - Read Latency 191ms, Write Latency 8ms

    Temp DB Data File - Read Latency 21 ms, Write Latency 98ms

    Production DB Data File - Read Latency 121ms, Write Latency 38ms

    Production DB Log File - Ready Latency 331ms, Write Latency 12ms

    The data and log files are on separate RAID 10 disk groups on the SAN - however the tempdb currently resides in the same disk group as the data drive does - however we don't have another disk group to allocate tempdb to - only other option would be the disk group where all the server OS's are located.

    I'm looking for some advice to narrow down where the problem might be at - whether SQL or SAN/Citrix Performance Issues.

    Thanks,

  • j21283 (5/25/2016)


    I'm hoping some folks might be able to help out with a poor performance situation for our SQL server.

    We recently did a major system upgrade from SQL 2008 and Windows 2008 and Xenserver to:

    SQL 2012 R2 (build 11.0.6020) 21GB max memory, 24GB total

    Windows Server 2012 R2

    Xenserver 6.5 (build 6.5.2.2477)

    After upgrading we are getting user complaints and noticeable slowness\time outs at our more peak times of use. Looking at DMV's most of our wait times center around the following:

    BACKUPIO

    WRITELOG

    ASYNC IO COMPLETION

    Looking at the virtual file stats DMV I am noticing read and write latency across multiple files:

    Master DB Log File - Read Latency 191ms, Write Latency 8ms

    Temp DB Data File - Read Latency 21 ms, Write Latency 98ms

    Production DB Data File - Read Latency 121ms, Write Latency 38ms

    Production DB Log File - Ready Latency 331ms, Write Latency 12ms

    The data and log files are on separate RAID 10 disk groups on the SAN - however the tempdb currently resides in the same disk group as the data drive does - however we don't have another disk group to allocate tempdb to - only other option would be the disk group where all the server OS's are located.

    I'm looking for some advice to narrow down where the problem might be at - whether SQL or SAN/Citrix Performance Issues.

    Thanks,

    I can definitively declare that your IO performance SUCKS. I can't, however, tell you WHY. 😎

    With a SAN and VM in play it could be a BUNCH of things. Do you know EXACTLY what underlies that RAID10 set you are using? If not find out. What else resides on the same spindles. Did you test the disk speed (or did the SAN people)?

    Why just 24GB RAM?? You seem have more than 24GB database, and SQL 2012 Standard Edition can use 64GB. Maxing out RAM is the single easiest and very inexpensive way to help IO performance.

    What types of total numbers are you seeing on file IO stalls in a differential analysis?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hi Kevin,

    Thanks for your response - this was where I was leaning as well - I will say before the upgrade of these various components our performance seemed to be better - although I don't have actual numbers the responsive of our application seemed much better before the switchover.

    We are actually currently investigating purchasing new hardware as we are about at the end of our life cycle for our SAN and Virtual Server Physical Hosts.

    Our SAN is the following: PowerVault MD3200 iSCSI SAN Storage - dual controller - each controller has 4 1Gb Ethernet ports.

    There is actually nothing else on those same spindles - these disks are only used for SQL and nothing else - which is why the performance is so surprising.

    Our database is about 40GB in size - so I could see if we have additional memory on the virtual host we could allocate to help alleviate the pressure for now. We don't have a dedicated person for our SAN or someone who knows much about it - it was contracted with some folks about 5 years ago - so I'd have to contact a consultant to have them take a look - so just trying to gather data to make the case for spending some money.

    How would I go about the IO stall differential analysis?

  • j21283 (5/25/2016)


    Hi Kevin,

    Thanks for your response - this was where I was leaning as well - I will say before the upgrade of these various components our performance seemed to be better - although I don't have actual numbers the responsive of our application seemed much better before the switchover.

    We are actually currently investigating purchasing new hardware as we are about at the end of our life cycle for our SAN and Virtual Server Physical Hosts.

    Our SAN is the following: PowerVault MD3200 iSCSI SAN Storage - dual controller - each controller has 4 1Gb Ethernet ports.

    There is actually nothing else on those same spindles - these disks are only used for SQL and nothing else - which is why the performance is so surprising.

    Our database is about 40GB in size - so I could see if we have additional memory on the virtual host we could allocate to help alleviate the pressure for now. We don't have a dedicated person for our SAN or someone who knows much about it - it was contracted with some folks about 5 years ago - so I'd have to contact a consultant to have them take a look - so just trying to gather data to make the case for spending some money.

    How would I go about the IO stall differential analysis?

    This company sounds like my typical client. 😀

    Take a snapshot of file IO stall dmv, wait a while (I usually do 3 or 5 mins during a period of interesting activity), take another snapshot and diff them and divide by time. Voila - average ms per IO.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Ah ok thought I was perhaps missing something for the analysis - I've ran a few scripts related to this and this is what we get:

    TempDB avg io stall ms: 65.5 ms

    Prod avg io stall ms: 94.3ms

  • j21283 (5/25/2016)


    Ah ok thought I was perhaps missing something for the analysis - I've ran a few scripts related to this and this is what we get:

    TempDB avg io stall ms: 65.5 ms

    Prod avg io stall ms: 94.3ms

    Not really helpful there, right? If those numbers are for 20 reads/writes who cares. If they are for 20000 in 180 seconds I care a LOT because the total slowdown (20000*94ms) is a telephone number. Also you must always split out reads and writes.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Ah ok that makes sense to track seperate - after benchmarking for a bit today if I'm calculating that right it doesn't seem like it is that bad right now? I've attached a spreadsheet with my calculations.

  • Again not enough information. What was the monitoring interval? If it was 30 seconds you are doing very poorly. If it was 180 seconds then you are waiting one second of clock time for IO per second of app time. That's not good in my book, especially since writes are high. If you monitored for 10 minutes then not too bad on the whole but things could still seem slowish if the actual write stalls were clustered together.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hi Kevin,

    The elapsed_time column is the interval between the baseline I took and the next polling. So the first few lines in the excel sheet would be about 6 minutes of polling time it looks like.

  • So about 400ms IO waits per 1 second clock time. I call that not good. If your perf is acceptable then maybe you can work on other things.

    Have you tuned your queries and your system(s)? IO slowness is often a result of poor structures, poor indexing, poor code, poor maintenance, poorly configured hardware/VM, etc, etc.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • j21283 (5/25/2016)


    I'm hoping some folks might be able to help out with a poor performance situation for our SQL server.

    We recently did a major system upgrade from SQL 2008 and Windows 2008 and Xenserver to:

    SQL 2012 R2 (build 11.0.6020) 21GB max memory, 24GB total

    Windows Server 2012 R2

    Xenserver 6.5 (build 6.5.2.2477)

    After upgrading we are getting user complaints and noticeable slowness\time outs at our more peak times of use. Looking at DMV's most of our wait times center around the following:

    BACKUPIO

    WRITELOG

    ASYNC IO COMPLETION

    Looking at the virtual file stats DMV I am noticing read and write latency across multiple files:

    Master DB Log File - Read Latency 191ms, Write Latency 8ms

    Temp DB Data File - Read Latency 21 ms, Write Latency 98ms

    Production DB Data File - Read Latency 121ms, Write Latency 38ms

    Production DB Log File - Ready Latency 331ms, Write Latency 12ms

    The data and log files are on separate RAID 10 disk groups on the SAN - however the tempdb currently resides in the same disk group as the data drive does - however we don't have another disk group to allocate tempdb to - only other option would be the disk group where all the server OS's are located.

    I'm looking for some advice to narrow down where the problem might be at - whether SQL or SAN/Citrix Performance Issues.

    Thanks,

    What kind of upgrade was this?

    Backup/Restore?

    Upgrade Wizard?

    Alex S
  • Ah ok I see how you got to that calculation now - I think with this information I'm going to see if I can get a resource to look at the Xen configuration and SAN items - I have a feeling this issue lies in the way our updated Xen environment is configured as the latency seemed better on the older SQL/OS/Xen platform - we do index/stat maintenance and also look for high read/write queries to try and tamper them down.

    Thanks for your help!

  • Hi Alex - we provisioned an entirely new server and installed SQL 2012 fresh and then backed up and restored the databases to the new server.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply